

When you cannot measure what you are speak- 
ing about, when you cannot express it in numbers, 
your knowledge is ot a meager and unsatisfactory 
kind; it may be the beginning of knowledge, but 
you have scarcely in your thoughts advanced to 
the stage of a science^ whatever the matter may be. 

— Lord Kelvin, 

:|e Jj: 

When the facts are gathered or discovered, when 
they are disentangled and identified, when they 
are sifted and verified, when they are counted and 
measured, the real task of the scholar is not ended 
— ^it is not even begun, but only prepared for him* 

— Maciver, 
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PREFACE 


This book is intended for the use of readers who are inter- 
ested in the understanding of statistical methods, and in their 
application in various fields, especially the social sciences. Conse- 
quently, the illustrative material has been drawn mainly from the 
fields of economies, sociology, and business, but occasionally from 
others. 

The arrangement of the topics treated in this volume is about the 
same as in the authors’ earlier book. Practical Business Statistics. 
The present text, however, does not stress business applications of 
statistical methods, but does present a greatly amplified treatment 
of analytical methods. The extensive discussion of the description 
and analysis of statistical data and of the making of statistical in- 
ferences will, we hope, make it useful to a wide group of teachers 
emphasizing various aspects of statistics. 

Although the treatment of the material in this text is intended 
to be at an elementary level, probably its scope is so great that it 
cannot be covered adequately in most introductory courses. Thus, 
manj>instructors will deem it advisable to omit certain material in 
the first course and reserve it for a second, more advanced or more 
specialized course. Among the chapters which may be completely 
omitted in a short course without disturbing the continuity of treat- 
ment are Chapters XIII, XVI, XVIII, XXIII, and XXIV. Parfe 
of other chapters may, of course, be left out at the discretion 
of the instructor. In the table of contents, chapters or sections 
which could well be omitted from an elementary course have been 
starred. 

One problem of the statistician, especially the teacher, is that of 
selecting symbols which are simple and clear. Some of the symbols 
used in the present volume differ from those used in the authors’ 
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earlier text. This departrire was made in an attempt to arrive at 
symbols which would be more easily understood and which would, 
consequently, facilitate the teaching process. 

If this volume has any merit, it must be at least partly because of 
those who first introduced the authors to statistical methods; be- 
cause of those texts (and other publications) which have preceded 
this; because of those publishers and individuals who have allowed 
us to reproduce charts or data of particular value, specific acknow- 
ledgment of which is made in the appropriate connection; and be- 
cause of numerous individuals who have assisted with the task* of 
completing the book. The authors especially extend their thanks 
to Mr. Morton S. Nagelberg, who assisted in the construction of 
charts and in connection with certain of the mathematical develop- 
ments; to Dr. James D. Paris, who assisted and advised concerning 
a number of charts; to Brant Bonner, Herbert Wolf, and John W. 
Gunter, for aid in making computations and preparing charts; and 
to Rosetta R. Croxton, for assistance in the reading of proof. 

Feedbrick E. Croxton 
Dudley J. Cowden 
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INTRODUCTION 


Statistical Data and Statistical Methods 

The term statistics is used in either of two senses. In common parlance 
it is generally used synonymously with the term data. Thus someone 
may say that he has seen ‘^statistics of industrial accidents in the United 
States.^^ It would be conducive to greater precision of meaning if we 
were not to use statistics in this sense, but rather to say “data of (or figures 
concerning) industrial accidents in the United States/^ 

“Statistics” also refers to the statistical principles and methods which 
have been developed for handling numerical data and which form the 
subject matter of this text. Statistical methods (or statistics) range 
from the most elementary descriptive devices, which may be understood 
by anyone, to those extremely complicated mathematical procedures 
which are comprehended by only the most expert theoreticians. It is 
the purpose of this volume not to enter into the highly mathematical and 
theoretical aspects of the subject but rather to treat of its more elementary 
and more frequently used phases. 

Statistics (that is, statistical methods) may be defined as the collection^ 
'presentation f analysis, and interpretation of numerical data. The facts 
which are dealt with must be capable of numerical expression. T/e can 
make little use statistically of the information that dwellings are built of 
brick, stone, wood, etc.; however, if we are able to determine how many 
or what proportion of, dwellings are constructed of each type of material, 
we have numerical data useful for statistical analysis. 

Statistics should not be thought of as a subject correlative with physics, 
chemistry, economies, and sociology. Statistics is not a science; it is a 
scientific method. The methods and procedures which we are about to 
examine constitute a useful and often indispensable tool for the research 
worker. Without an adequate imderstanding of statistics the investigator 
bx the social sciences may frequently be like a blind man groping in a dark 
closet for a black cat that isn^t there. The methods of statistics are useful 
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in an ever-widening range of human activities, in any field of thought in 
which numerical data may be had. 

The derivation of the word “statistics'^ suggests its origin. The ad- 
ministration of states required the collection and analysis of data of popu- 
lation and wealth for purposes of war and finance. Gradually data of 
more diverse nature were obtained for the general uses of government. 
Certain phases of statistics were developed by students of games of chance. 
Insurance and biology, as well as other natural sciences, were fertile fields 
for the application and development of statistical methods. Today there 
is hardly a phase of human activity which does not find statistical devices 
at least occasionally useful. Economics, sociology, anthropology, busi- 
ness, agriculture, psychology, and education — all lean heavily upon sta« 
tistics. The medical research worker often must rely upon statistics 
to determine the significance of his results. The lawyer, especially if he 
be in corporation practice, may frequently find statistical devices of defi- 
nite use. It should, of course, be added that the musician, the artist, the 
actor, and the writer of fiction would rarely have occasion to use statistics, 
but even here certain data of sales, box-office receipts, and trends of 
popular taste might be apropos. 

In defining statistics it was pointed out that the numerical data are cob 
lected, presented, analyzed, and interpreted. Let us briefly examine each 
of these four procedures. 

Collection. Statistical data may be obtained from existing published 
or unpublished sources, such as government agencies, trade associations, 
research bureaus, magazines, newspapers, individual researchers, and 
elsewhere. On the other hand, the investigator may collect his own 
information, going perhaps from house to house to obtain the data. The 
first-hand collection of statistical data is one of the most difficult and 
important tasks which a statistician must face. The soundness of his pro- 
cedure determines in an overwhelming degree the usefulness of the data 
which he obtains. 

The following chapter treats of these two methods of obtaining data. 
It should be emphasized, however, that the investigator who has experi- 
ence and good common sense is at a distinct advantage if original data must 
be collected. There is much which may be taught about this phase of 
statistics, but there is much more which can be learned only through 
experience. Although a person may never collect statistical data for his 
own use and may always use published sources, it is essential that he have 
a working knowledge of the processes of collection and that he be able to 
evaluate the reliability of the data he proposes to use. Unreliable data do 
not constitute a satisfactory base upon which to rest a conclusion. 

It is to be regretted that many people have a tendency to accept sta- 
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tistical data without question. To them, any information which is pre- 
sented statistically is regarded as correct; the mere fact that a statement 
has been put in definite quantitative terms is sufficient to establish its 
authenticity. For this reason it behooves the research worker to do his 
work with the greatest possible care, in order that his conclusions may 
be valid. Above all, the basic data which are collected must be as accurate 
and comprehensive as is feasible. Let it not be said, as Stuart Chase 
commented in a book review: 

The learned economists todaj^ make graphs, charts, index numbers, of 
things which they have inadequately observed. Their mathematics has 
run ahead of then* science. 

Presentation. Either for one^s own use or for the use of others, the 
data must be presented in some suitable form. Usually the figures are 
arranged in tables or represented by graphic devices as described in Chap- 
ters III to VI. 

Analysis. In the process of analysis, data must be classified into useful 
and logical categories. The possible categories must be considered when 
plans are made for collecting the data, and the data must be classified as 
they are tabulated and before they can be shown graphically. Thus the 
process of analysis is partially concurrent with collection and presentation. 

There are four important bases of classification of statistical data: (1) 
qualitative, (2) quantitative, (3) chronological, and (4) geographical, each 
of which will be examined in turn. 

Qualitative. When, for example, employees are classified as union or 
non-union, we have a qualitative differentiation. The distinction is one 
of kind rather than of amount. Individuals may be classified concerning 
marital status, as single, married, widowed, divorced, and separated. 
Farm operators may be classified as full owners, part owners, managers, 
and tenants. Rubber may be designated as plantation or wild, according 
to its source. 

QuanUtative. When items vary in respect to some measurable charac- 
teristic, a quantitative classification is appropriate. The United States 
Census of 1930 reports 12,351,549 non-farm homes which were rented. 
The distribution of monthly rentals is shown in Table 1. 

Families may be classified according to the number of children. Manu- 
facturing concerns may be classified according to the number of workers 
employed, and also according to the value of goods produced. 

Most quantitative distributions are frequency distributions. The data 
of rented non-farm homes show the number (frequency) of homes falling 
in each rental category. SunUarly, the data of Table 27 show a frequency 
distribution of the grades of the 1937 class of the United States Naval 
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TABLE 1 

Non-farm Homes in the United States 
Classified by Amount op Monthly 
Rental, 1930 


Monthly rental 

Number of non-farm 
homes 

Under $10 

1,563,952 

$10,00 to 14 99 

1,330,927 

16 00 to 19 99 

1,302,387 

20 00 to 29 99 

2,545,208 

30 00 to 49.99 

3,191,435 

50 00 to 74.99 

1,603,401 

75.00 to 99.99 

343,071 

100.00 and over 

256,339 

Rental unknown 

315,829 

Total ... . 

12,351,549 


Source Statistical Abstract of the United States, 1937, p. 
50 and by correspondence 


Academy. A number of other frequency distributions are shown in 
Chapters VIII, IX, and X. 

Sometimes, qualitatively classified data may be reclassified on a quanti- 
tative basis by making very slight changes. The assets of a bank may be 
listed in respect to degree of liquidity (cash, due from banks, United States 
securities, marketable securities, caU loans, eligible paper, other loans, real 
estate loans, real estate, and furniture and fixtures). Although these 
categories differ from one another in a more or less unassignable quanti- 
tative fashion, the classification is actually made upon a qualitative basis. 
If we should reclassify the bank assets according to the length of time 
required to convert each into cash, the classification would be quantitative. 
In general the assets would be in the same order as before, but a few 
specific items among the less liquid qualitative groups (for example, cer- 
tain real estate and real estate loans) would be convertible into cash in a 
relatively short time. 

ChronologicaL Chronological data or time series show figures concerning 
a particular phenomenon at various specified times. For example, the 
closing price of a certain stock may be shown for each day over a period 
of months or years; the birth rate in the United States may be listed for 
each of a number of years; production of coal may be shown monthly for 
a span of years. The analysis of time series, involving a consideration of 
trend, cyclical, periodic (seasonal), and accidental movements, will be 
discussed in Chapters XIV to XIX. 

In a certain sense, time series are somewhat akin to quantitative distri- 
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butions in that each succeeding year or month of a series is one year further 
removed from some earlier point of reference. However^ periods of time — • 
or, rather, the events occurring within these periods — differ qualitatively 
from each other. The essential arrangement of the figures in a time se^ 
quence is inherent in the nature of the data under consideration. 

Occasionally a time series may be converted into a frequency distribu- 
tion. If a railroad company has kept records of the number of railroad 
ties replaced each year, the data constitute a time series. When the same 
information is used in conjunction with the dates of installation, the life 
of the various ties may be expressed as a frequency distribution, showing 
perhaps: 

Length of hfe Number of ties 

4 but under 5 years 2 

5 but under 6 years 5 

6 but under 7 years 17 

etc. etc. 

Geographical, The geographical distribution is essentially a type of 
qualitative distribution, but is generally considered as a distinct classifica- 
tion. When the population is shown for each of the states in the United 
States, we have data which are classified geographically. Although there is 
a qualitative difference between any two states, the distinction that is 
being made is not one of kind but of location. Various geographical 
series are shown in Tables 4 and 7 and in Chart 57. 

Sometimes a geographical distribution may be put into the form of a 
frequency distribution. Thus, if we had data of the yield of com per 
acre in each county of Iowa, we should have a geographical series. This 
may be put into the form of a frequency distribution by stating the number 
of counties having yields per acre of ‘^10 and under 15 bushels,’^ ^15 and 
under 20 bushels,” etc. 

The presentation of classified data in tabular and graphic form is but 
one elementary step in the analysis of statistical data. Many other proc- 
esses are described in the following pages of this book. Statistical investi- 
gation frequently endeavors to ascertain what is typical in a given situa- 
tion. Hence all types of occurrences must be considered, both the usual 
and the unusual. 

In forming an opinion, most individuals are apt to be unduly influenced 
by unusual occurrences and to disregard the ordinary happenings, in 
any sort of investigation, statistical or otherwise, the unusual cases must 
not exert undue influence. Many people are of the opinion that to break 
a mirror brings bad luck. Having broken a mirror, a person is apt to be 
on the lookout for the expected ^'bad luck” and to attribute any untoward 
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event to the breaking of the mirror. If nothing happens after the mirroi 
has been broken, there is nothing to remember and this result (perhaps the 
usual result) is disregarded. If bad luck occurs, it is so unusual that it is 
remembered, and consequently the belief is reinforced. The scientific 
procedure would include all happenings following the breaking of the 
mirror, and would compare the “resulting’^ bad luck to the amount of bad 
luck occurring when a mirror has not been broken. 

Statistics, then, must include in its analysis aU sorts of happenings. If 
we are studying the duration of cases of scarlet fever, we may study what 
is typical by determining the average length and possibly also the diver- 
gence below and above this average. When considering a time series show- 
ing steel-mill activity, we may give attention to the typical seasonal pat- 
tern of the series, to the growth factor (trend) present, and to the cyclical 
behaviour. Sometimes it is found that two sets of statistical data tend to 
be associated and it behooves us to ascertain what is typical in the rela- 
tionship. In the chapter on correlation it is pointed out (p. 651) that 
there is an association between temperature and the rapidity with which 
crickets chirp. If the temperature increases, the crickets chirp faster; if 
the temperature decreases, the crickets chirp more slowly. The relation- 
ship can be expressed mathematically and we can estimate the rapidity of 
crickets^ chirps from the temperature; or, conversely, lacking a thermome- 
ter, we can make a good estimate of the temperature based upon the 
rapidity of chirps. 

Occasionally a statistical investigation may be exhaustive and include 
all possible occurrences. More frequently, however, it is necessary to 
study a smaller group or sample. If we desire to study the expenditures 
of lawyers for life insurance, it would hardly be possible to include all 
lawyers in the United States. Resort must be had to a sample; and it is 
essential that the sample be as nearly representative as possible of the entire 
group, so that we may be able to make a reasonable inference as to the 
results to be expected for an entire population. The problem of selecting 
a sample is discussed in the following chapter. In Chapters XII and 
XIII an attempt is made to determine how much reliance may be placed 
in the results obtained from a sample. We should also have some idea 
what variation in results to expect if additional samples are selected. 
These are important considerations, since an unrepresentative sample or 
an unreliable statistical measure may cause us to draw false and un- 
warranted conclusions. 

Sometimes the statistician is faced with the task of forecasting. He 
may be required to prognosticate the sales of automobile tires a year hence, 
or to forecast the population some years in advance. Several years ago 
a student appeared in a summer session class of one of the writers and in 
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a private talk announced that he had come to the course for a single pur- 
pose: to get a formula which would enable him to forecast the price of 
cotton. It was important to him and to his employers to have some ad- 
vance information on cotton prices, since the concern purchased enormous 
quantities of cotton. Regrettably, the young man had to be disillusioned. 
To our knowledge, there are no magic formulae for forecasting. This does 
not mean that forecasting is impossible; rather it means that forecasting 
is a complicated process of which a formula is but a small part. And 
forecasting is uncertain and dangerous. To attempt to say what will 
happen in the future requires a thorough grasp of the subject to be fore- 
cast, up-to-the-minute knowledge of developments in allied fields, and 
recognition of the limitations of any mechanical forecasting device. Fur- 
ther comments concerning forecasting are to be found in Chapter XXV. 

Interpretation. The final step in an investigation consists of inter- 
preting the data which have been obtained. What are the conclusions 
growing out of the analysis? What do the figures tell us that is new, that 
reinforces or casts doubt upon previous hypotheses, or (if the study is 
sufidciently inclusive) that proves or disproves former beliefs? The results 
must be interpreted in the light of the limitations of the original material. 
Too exact conclusions must not be drawn from data which themselves are 
but approximations. It is essential, however, that the investigator dis- 
cover and clarify aU the useful or applicable meaning which is present in 
his data. 

A Few Improprieties 

The research worker must be constantly on the alert to avoid any mis- 
uses of his material. Illogical and careless reasoning or improper use of 
data will destroy the value of a study which may be technically acceptable 
in its earlier phases. A few examples of fallacious procedures may clarify 
this point. In later chapters of the book, other fallacies are occasionally 
mentioned in connection with the methods to which they apply. 

Bias. The presence of bias on the part of an investigator is, obviously, 
sufficient to discredit the entire undertaking. Bias may be conscious or 
deliberate; in such a case it is synonymous with falsification, On the other 
hand, an unconscious bias may be operative, and this, perhaps, is a more 
dangerous form since the analyst himself may not be aware of it. The 
following is an illustration of apparently unconscious bias;^ 

A friend had invited an acquaiatance to lunch, and found at the end 
of the meal that he had left his purse in the office and had no money. 

The acquaintance, at his request for a loan, took out a five-dollar bill 

^ From ^'The Mind of a by Jessica Cosgrove, Good Housekeeping j January 

1927, p. 206, 
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atLd a ten-doUar bill. My friend took one of them — this day he does not 
know which — telling his acquaintance not to let him forget the loan. He 
did forget it, however, until several weeks later when they met again, and 
each wrote on a piece of paper the sum he thought had been borrowed. 

The lender wrote ten, and the borrower five. They were both psycholo- 
gists, so each searched his memory carefully, and each had circum- 
stantial evidence that seemed to each conclusive, to prove himself right. 
Neither cared about or needed the money especially, but to them it indi- 
cated a universal principle, that each of us interprets and remembers 
facts in the form most agreeable to himself. No wonder both sides must 
be represented in courts of law, and that much honestly given evidence 
must be rejected! 

As will be seen in the following chapter, statistical data cannot be picked 
out of thin air as the conjurer appears to produce coins at his finger tips. 
The process is one requiring care and attention to details. The data, 
when obtained, should be of value and not be casually disregarded. Note 
what a reviewer said of a certain author: 

Blank is thorough and undaunted. Have statistics on any subject 
been collected before? He has collected more and better ones. If it is by 
its intrinsic nature unchartable, he has charted it none the less. . . . 
Chronology itself fares badly in his hands at times. If his examples 
require to be a century or two misplaced, Blank can forget even his sta- 
tistics and his charts m the good cause of logic. 

Omission of important factor. Shortly after the introduction of the 
ail-metal top for automobiles, a certain manufacturing company felt called 
upon to prove that all-metal tops did not result in hotter car interiors. 
They suggested a test involving three steps: 

1. Take a piece of top fabric about 8 inches square. Place a piece of 
fining material of similar size beneath the fabric, and a thermometer 
beneath the lining material. 

2. Take a piece of highly finished steel about 8 inches square. Place 
similar sized pieces of i-inch felt and lining material beneath the 
metal, and a thermometer beneath the lining material. 

3. Place each of the above assemblies on a board at room temperature. 
Carry the entire apparatus out into hot 'sunshine, leave it exposed 
for about 10 minutes, and then read the temperatures of the two 
thermometers. 

The dMculty with the above experiment is that the reader is asked, 
in step 2, to use a piece of highly finished steel. Automobile tops are 
n^inted^ — ^many of them with black or a dark color of paint — and there- 
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fore absorb more heat than does highly finished steel. The obvious fallacy 
in the test vitiates the experiment, although the additional insulation may 
actually make the metal-top car cooler than the fabric-top car. 

Carelessness. We cannot go through life without making mistakes, 
but carelessness should be reduced to a minimum. The wife of one of the 
authors wrote to a large department store to ask the size of a cedarized 
storage chest. The reply said, ^^This merchandise is available in the 
3'' X 1" X W' size.^^ 

Many of us have received sealed envelopes minus enclosures, or postal 
cards blank on the message side, and have, perchance, been guilty of send- 
ing the grocer's bill back to the grocer minus the check or with the check 
imsigned. 

A study of salaries was under way and a certain corporation had been 
requested to furnish data concerning its employees. A note to its 
report appeared substantially as follows: ^^AU salaries under $5,000 per 
annum are shown as the maximum for each type of work. The assistant 
to the auditor stated that the maximum is equivalent to a general average 
for each group." Perhaps this is an illustration of a conscious bias on the 
part of the assistant to the auditor. It must be obvious that, if the maxi- 
mum and the average are the same, then there are no values below the 
maximum. 

Non-sequitur. A weekly news magazine, the circulation of which had 
been growing in a healthy fashion, undertook in 1936 to demonstrate that 
its readers greatly exceeded its circulation. After showing figures of its 
circulation, the magazine stated: ^‘And each of these subscribers represents 
3.26 cover-to-cover readers, according to former Deputy Police Commis- 
sioner , who counted and identified [sic] 216,948 fingerprints on 

copies his operatives had picked up at random from subscribers' homes in 
seven different cities or towns." How could the investigator know the 
fingerprints belonged to cover-to-cover readers? Or, did he find each 
fingerprint on every page and, if so, does that prove each page was read? 
Do you ever actually read a magazine from cover-to-cover? 

Non-comparable data. In July 1936, newspapers carried reports of a 
meeting of the American College of Osteopathic Obstetricians at which a 
doctor is reported, by a metropolitan paper, to have stated that the mater- 
nal death rate among mothers treated by osteopathic physicians is less 
than half that among cases handled by the medical profession. The 
higher rate in the latter instance was said to be due to excessive use of 
anaesthetics, interruption of labor, and undue reliance on mechanical 
devices. A survey of 14,000 osteopathic delivery cases was said to show 
a maternal death rate of 2 8 per thousand cases. This figure was com- 
pared with the nation's average of more than 6 per thousand. It should 
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be obvious that the average rate for the entire country is not representative 
of the rate for cases attended by the medical profession, since many ma- 
ternity cases are not attended by physicians. 

The makers of a small, inexpensive car had been stressing the fact that 
the introduction of their car had converted many used-car buyers into 
new-car owners. Concerning costs of operation, they pointed out that 
“owners report up to thirty-five miles to the gallon of gasoline, which com- 
pared with the average mileage obtained with a used car ... is a saving 
of great importance to persons in the low-income group.^^ The comparison 
of maxt'tnum mileage for one type of car with average mileage for other 
t 3 rpes of used cars is certainly unjustified. 

Confusion of association and causation. Sometimes factors wb'ch are 
associated are erroneously regarded as being causally related. A southern 
meteorologist discovered that the fall price of corn is inversely related to 
the severity of hay fever cases. This does not imply that the low price of 
corn causes hay fever to be severe, nor does it imply that severe cases of 
hay fever bring about a drop in the price of corn. The price of corn is 
generally low when the corn crop has been large. When the weather 
conditions have been favorable for a bumper corn crop, they have also 
been favorable for a bumper crop of ragweed. Thus the fall price of corn 
and the suffering of hay fever patients may each be traced (at least partly) 
to the weather, but are not directly dependent upon each other. A further 
discussion of association and causation is given m Chapter XXII. 

Another instance of the confusion of association with causation is illus- 
trated by Chart 1. In connection with this chart it was asserted, “When 
farm income goes up, factory payrolls invariably follow, but they do not 
lead the procession. One is cause, the other effect.^^ If such a proces- 
sion does exist, it can hardly be shown by annual data. If factory pay- 
rolls follow farm income, we should show that fact by plotting monthly 
data as is done for two other series in Chart 253, page 807. As to the 
causal relationship, it is fairly obvious that, while an increase (or decrease) 
in farm income does have a corresponding effect upon factory payrolls, 
the payrolls in turn have a reciprocal effect upon farm income. Further- 
more, both are dependent upon any other factors which tend to affect the 
pattern of general business. 

Insufficient data. Insufficient data result in a high degree of uncer- 
tainty respecting any conclusion which may be made from them. A very 
small sample may lead us to a correct conclusion, but we caimot be sure 
of our conclusion. When a physician is developing a new treatment^ he 
does not announce its efficacy after trying it out on a few individuals. He 
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must have enough data to be relatively sure of results. If two or three 
subjects respond favorably, he cannot be sure that the occurrences were 
not due to chance. The favorable responses of these few might have come 
without the treatment, or in spite of it! Of course, there must be a “con- 
trol” group to show how the subjects would respond without any treat- 
ment. Moreover, both the control group and the treated group must be 
sufficiently large to warrant a conclusion. A discussion concerning the 
size of a sample and the reliability of values computed from it is given in 
Chapters XII and XIII. 

Unrepresentative data. Conclusions may be based upon data which 
are numerically sufficient, but which are not representative. A small 
sample may be representative; on the other hand, a large sample may not 
be representative. 

An example of a conclusion based upon unrepresentative data is the 
forecast of the 1936 presidential election as made by the Literary Digest 
More than 10,000,000 straw ballots were sent out by the Digest Of 
these, 2,376,523 were returned and they indicated that 370 electoral votes 
would be cast for Landon and 161 for Roosevelt. The final election 
results were 523 electoral votes for Roosevelt and 3 for Landon. The 
difficulty was that the mailing lists used as a basis for the poll were rela- 
tively heavily weighted with persons in the upper economic brackets 
and thus were not representative of the entire voting population. 

Concealed classification. Conclusions drawn from statistical data may 
sometimes be invalid because of the presence of a concealed classification 
which is overlooked. The fallacy of concealed classification is illustrated 
by some data appearing in the Monthly Labor Review for February 1938 
and concerning which its readers were warnea. Data were presented 
showing the union wage rates in Hebrew and in non-Hebrew bakeries. It 
appeared from the figures that Hebrew bakeries paid an average hourly 
rate about 50 per cent higher than non-Hebrew bakeries. Qualifying 
this, the Review said, “Although Hebrew bakeries generally have higher 
rates, one reason for this large difference is the fact that a large proportion 
of the Hebrew bakeries are located in New York City, where the average 
of all rates is higher than in other localities.” 

A concealed classification was found to be present in a study of suicides. 
The data seemed to show that suicides were more likely to occur among 
certain religious groups than among others. Upon further consideration 
it was apparent that the matter of the urban or rural occurrence of the 
suicides had been overlooked. Hence the conclusion should have been — 
not that suicides tended to tie up with given religious groups — ^but that 
suicides were more common in urban territories and that these religious 
groups were also more numerous in the cities. 
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Research Methods 

It must not be assumed that the statistical method is the only method 
to use in research; neither should this method be considered the best attack 
for every problem. Just as the carpenter has a number of tools, each 
appropriate for a different sort of operation, so the researcher can avail 
himself of various techniques which are the tools of his trade and each of 
which is appropriate to a specific t3rpe of situation. If an amateur car- 
penter uses a screwdriver in lieu of a chisel, the results are not likely to 
be either workmanlike or satisfactory. Similarly, it is important that the 
investigator consider his problem carefully at the outset and make use of 
the technique or techniques which are appropriate to it. Just as the 
carpenter needs to use more than one tool in completing a piece of work, 
so the research worker must often make use of, not one, but several 
methods.^ 

When we desire a great deal of information concerning each individual 
or occurrence to be studied, much of our data may be non-quantitative 
by its very nature. In such an event we employ the case method of hivesti- 
gation, the purpose of which is to study in detail the characteristics peculiar 
to the individual case and to generalize from a number of such detailed 
studies. Some of the information obtained in a case study (such as wages, 
number of offspring, etc.) may be statistical and when many cases are 
included, statistical summaries may be made of the non-quantitative 
information obtained. 

Sometimes a problem may be attacked by the historical approach. 
Although the historical method is largely descriptive and non-quantitative, 
we may find statistical aspects when we consider growth or decline of 
imports, exports, population, and other series. 

Again, the appropriate procedure may be to make use of the experi- 
mental method^ in which we allow only the factor we are studying to vary, 
and thus we attempt to control as many as possible of the other factors. 
For example, if we wished to study the effect of car weight upon tire 
mileage, we should control road conditions, speed, temperature, size .of 
tire, quality of rubber and of cord, inflation of tire, and many other factors. 

In the social sciences, the experimental method can rarely be applied 
and certain aspects of the statistical method are used in lieu of it. We 
cannot, for example, ascertain the effect of different sorts of diets upon 
length of hfe, by forcing groups of people to live upon prescribed diets 
and by actually making all other phases of their lives identical. Instead, 


2 Various methods are described in Manuel Conrad Elmer, Social Research^ Prentice- 
Hall, Inc., New York, 1939, and in Walter Earl Bpahr and Rinehart John Swenson, 
MetJiods and Statm of Scientific Resear'^h. Harper and Brothers, New York, 1930. 
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we must find groups of people on different diets, and then we must measure 
the importance of and control statistically as many as possible of the other 
phases of their lives since we cannot control them experimentally. The ex- 
perimental and statistical methods are not antithetical, but under practical 
conditions the statistical method supplements the experimental method. 
If an experiment could be so designed that all variables were completely 
controlled, statistics might not be needed. At best we can usually control 
but a few of the more important factors, and thus it is necessary to evaluate 
statistically the importance of a host of other minor disturbing factors 
(sometimes designated as ^^chance”)? described in Chapters XII and 
XIII. 

Some problems may be approached by the deductive method rather than 
by the inductive method. When a hypothesis has been set up deductively 
and when quantitative data are available, statistics may enable an in- 
ductive test to be made of the hypothesis, and this test may serve to sup- 
port or discredit the hypothesis. Conversely, relationships arrived at 
statistically (as, for example, the rather close negative association found 
in some states concerning the size of farms and the value of land per acre) 
may suggest causal connections which may be worked out deductively. 
Again we have two methods which are not antagonistic, but comple- 
mentary. 

Selected References 

R. E. Chaddock' Principles and Methods of Statistics, Chapters I, II, III; Houghton 
Mifflin Co., Boston, 1925. Chapter II contains illustrations of misuses of 
statistical data. 

F. E. Croxton and D. J. Cowden: Practical Business Statistics, Chapter I; Prentice- 

Hall, Inc , New York, 1934. From the point of view of business statistics. 
Gives additional illustrations of misuses of statistical data. 

M. C. Elmer: Social Research, Chapters I-XV; Prentice-Hall, Inc., New York, 
1939. Methods of research are discussed. 

G. A. Lundberg: Social Research, Chapters IV, VIII; Longmans, Green and Co., 

New York, 1929. Methods of research. 

W. E, Spahr and R. J. Swenson: Methods and Status of Scientific Research, Chapter 
X; Harper and Brothers, New York, 1930. Research methods 
P. V. Young: Scientific Social Surveys and Research. Chapters V, IX, X, XIV; 
Prentice-Hall, Inc., New York, 3939. Methods of research. 
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When a research worker desires statistical data concerning a topic of 
interest, it may be that he can choose between collecting the data himself 
or obtaining the needed figures from published or unpublished compila- 
tions. If an individual or organization has prepared reliable data which 
are pertinent to the problem, it is vastly less expensive to make use of the 
existing information. Although to collect one^s own data is more costly, 
the procedure enables the investigator to obtain exactly that information 
which is needed to answer the specific questions that are under consider- 
ation. 

Not all readers will be faced with the problem of collecting original sta- 
tistical data; many will find it possible to refer to existing sources for 
information. The data from such sources may be evaluated and more 
intelligent use may be made of them if the research worker has some knowl- 
edge of the procedure and pitfalls involved in collecting, editing, and 
marshalling statistical data. 

An illustration cited by Stamps is to the point: Harold Cox, when a 
young man in India, quoted some Indian statistics to a judge. The judge 
replied, “Cox, when you are a bit older, you will not quote Indian statistics 
with that assurance. The government are very keen on amassing statis- 
tics — they collect them, add them, raise them to the nth power, take the 
cube root and prepare wonderful diagrams. But what you must never 
forget is that every one of those figures comes in the first instance from the 
chowty dar (village watchman), who just puts down what he damn pleases.^' 
It should be added that this story refers to the India of a day long past 
Today India has many able statisticians and an active statistical society. 
Presumably the chowty dar no longer functions as the source of local sta^ 
tistical information. 

The process of collecting statistical data will be examined first in the 


^ Sir Josiah Stamp, Some Economic Factors in Modern Life, pp. 25S-259, P, S. Kin§ 
and Son. X^ndon, 1929. 
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following text. Later in the chapter, attention will be directed tov/ard 
the use of statistical sources. 

Collecting Statistical Data 

Method of collection. Statistical data are frequently obtained by a 
process in which the desired information is obtained from the house- 
holder, business man, or other informant, either by an enumerator who 
visits the informant and asks the necessary questions (entering the replies 
on a schedule), or by sending to the informant a list of questions (some- 
times called a questionnaire) which he may answer at his convenience. The 
data collected at each population census are obtained by the enumeration 
process. Sometimes information is obtained by registration, which means 
that the information is reported to the proper authority when (or shortly 
after) an event occurs. Thus births and deaths must be registered. In 
many states automobile accidents must be reported to the commissioner 
of motor vehicles. 

In general outline the problems of obtaining data by mailing question- 
naires, by enumeration, and by registration are similar. Under a system 
of registration there is, of course, the difiSiculty that many persons will 
neglect to register. Constant vigilance and frequent checkups are neces- 
sary on the part of the registrar. Registration, however, is usually with 
a properly designated government official, and there is ordinarily legal 
compulsion that the data be supplied. Since most statistical information 
IS obtained by mailing questionnaires or by enumeration, the balance of 
this section will be devoted to the procedure for collecting data by such 
methods. 

Outline of procedure. The steps in a statistical investigation may be 
designated as follows: 

1. Laying out the general plan. 

2. Devising questions and making the schedule. 

3- Selecting the sample (if the enumeration is not to be a complete one). 

4. Using the schedules to collect the data. 

5. Editing the schedules. 

6. Tabulating the data. 

7. Preparing finished tables and charts. 

8. Analysis and interpretation. 

The procedure will usually be in the order listed above, except that the 
selection of the sample may be handled as a phase of the first step, laying 
out the general plan. 

Laying out the general plan. If a topic is to be studied statistically, it 
behooves the investigator to become familiar at the outset with what has 
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already been done by others. He may find that someone else has already 
examined the same topic and that his questions have already been an« 
swered. He may wish to design his study so that it can be compared with 
those which have preceded his. He will doubtless profit by the experience 
and the mistakes of others. He may find that the difficulties involved in 
the investigation of his topic are so great that they are insurmountable; 
the cost may be too great, or it may appear that informants do not wish to 
divulge the type of information which is needed. 

Having studied what has been done by others, the investigator is ready 
to consider the general aspects of what he would like to know. If an un- 
employment study is projected, there are many inquiries concerning each 
individual which are pertinent. The following suggests some of the more 
important ones: 

Does the individual have any dependents? How many? 

Is the person male or female? 

Is he married? 

How old is the person? 

Is he native white, native colored, or foreign born*^ If foreign born, 
from what country? 

Does he own property? 

What is his usual occupation? In what industry? 

What type of work is he doing at present? (If the study is a detailed 
one, consideration may be given to hsting the job experience of the 
individual for a number of years, together with the wages received.) 

Is he employed full time^ Part time; if so, what fraction*?^ Is he 
entirely unemployed‘s 

If the individual is working part time or is totally unemployed, what is 
the reason? 

If he is totally unemployed, how long has he been so? Also, is he able 
to work and willing to work; or, alternately, is he actively looking for 
work? 

The reader will doubtless thinK of other questions of importance, but 
these suffice to indicate the nature of this preliminary step. Usually we 
cannot undertake to obtain answers to all the questions which are im- 
portant. It may be too expensive to make so comprehensive an inquiry. 
There may be some questions (such as the one in regard to property owner- 
ship, and the one in regard to wages) wffich informants will often decline 
to answer. The most important and practicable questions are therefore 
selected to form the basis of the inquiry. It is these which will be incorpo- 
rated into the schedule. 

There are several matters of* general importance which are often con- 
sidered in connection with laying out the general plan. One of these has 
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to do with the extensiveness of the study. Will it include the entire com- 
munity or merely a sample? If funds and enumerators are available, we 
may make a complete enumeration; often we must be satisfied with a 
sample. We shall consider the selection of the sample after we have 
completed the discussion of the schedule. 

Another problem concerns whether the schedule is to be sent out by mail 
(in which case it must be very simple and self-explanatory) or whether 
enumerators are to be used. If use is to be made of paid enumerators, 
it is necessary to locate qualified persons. However, it is often true that 
funds are not available to hire enumerators. In fact, it is sometimes the 
case that, valuable as the results of an investigation might be, they are not 
worth what it would cost to employ enumerators! Studies have been 
made using, as unpaid enumerators, policemen, college students, postmen, 
truant officers, and even school children. 

A third matter has to do with the place where the informants will be 
interviewed. In the case of the unemployment study we could send 
enumerators to interview people at their work, in the streets, or at home. 
It is obvious that the last of the three is preferable. For the unemploy- 
ment study we should also consider whether to list on our schedule all the 
people in a household, irrespective of age, sex, desire for work, and mental 
or physical condition. To list everyone would give us a complete picture, 
but it would also clutter up the schedule with relatively useless information. 
For the purposes of an unemployment study we are ordinarily not inter- 
ested in housewives who seek no work outside the home or in young 
children. We may be interested in elderly men, in an attempt to learn 
what proportion of the population is retired or is considered too old or in- 
firm to work. Thus it may be desirable to exclude all persons below (say) 
16 or 18 years of age, and all females not usually employed. 

Devising questions and making the schedule. It has already been 
pointed out that not all the questions which we would like to have answered 
can be included in the schedule. Having selected those points which we 
wish to include in our inquiry, we must formulate each question so that 
it may be readily and accurately answered, and then we must draft the 
schedule form. The accompanying schedule form used in unemployment 
studies in Buffalo, New York, shows how some of the questions concerning 
unemployment may be worked into a very simple schedule form.^ Shown 

^ From Frederick E. Croxton, Unemployment in Buffalo j November 19$2y Special 
Bulletm No. 179, Division of Statistics and Information, New York State Department 
of Labor- 

The Buffalo study will be referred to frequently to illustrate various points in this 
chapter. No mference should be drawn from this that it is considered a model study. 
It is, however, simple enough in its general outlines to facilitate explanation of numer- 
ous principles and methods. 
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Schedule used in the Buffalo unemployment studies, 1930-1932. The forms were printed on cards 5X8 inches in size 
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also is the schedule used in the 1930 Census of Population. It so happens 
that both of these inquiry fornas are set up in columnar arrangement 
with a line for each individual. Schedules are not necessarily made in 
this fashion. The schedule used for a study of accidents is merely a care- 
fully arranged series of questions; so also is the form used in one of the 
inquiries made by Hartwell, Jobson, and Elibbee concerning food. The 
question form used in a mail inquiry must be even easier to understand 
than the schedule used by enumerators. Business houses find it pays to 
send out questionnaires that are attractive and interesting, and that 
require a minimum of effort in answering, for by so doing they receive a 
larger proportion of replies fiom their mailing lists. 

B li, 8 —139 

U S DEPARTMENT OF LABOR JRECORD OF ACCIDENT 

BUREAU OF LABOR STATISTICS 

EitftUiihment No Dale , Hour Age Ses . Mam«f 

Dependeni*. how many> Speak English? Race . .. Dept . „ 

Occupation . — .... , . .. .. ,, Worked for company how long? . .. 

Had the injuied worked in the industry elsewhere? If so, how long? . . .. .. .... .... 

Machine tool appliance object, or condition in connecUem with which accideoi occurred? . ... .. 

Describe m full how the accident happened .... ..... . . ...... 


What part of the body was injured? . . ... . Was the injury an abrasion, bruise, cut, laceration, puncture burn, scald, concusatoa, dislocation, 

fracture, sprain, strain, dismemberment by the accident, nervous shock, or other? ... . Did the injury become infected? . _ . .... 

Reaults of injury* DEATH ? . . .. PERMANENT DISABILITY? If so state nature 

TEMPORARY DISABILITY? . ... Days lost 

SPACES RESERVE^:? FOR CODES 


Serial No Dept .. Year ... ... . Month . Day of 'week ... .... Hour... 

Age Sex ... ..... ...... Conj cond . „ Depend ..... — ... English . .. ..... .... Race_„. 

Etperienee . . . Occ ....„ .. .......,r Cause .. Cause anal Part .... .. ... ..... Mode ... 

Location ............ ... Nature™..... Result...™ ... ™ Per dis. Temp. dis... ..... Time ... . 


Schedule Used by the United States Bureau of Labor Statistics in a Study of Industrial 

Accidents. 

Observe that there are notations to assist the enumerators at the bottom 
of the population schedule. Instructions were given in a separate booklet, 
in which 33 pages were devoted to the population schedule. A separate 
sheet of instructions was furnished to the enumerators in the Buffalo 
unemployment study. 

The construction of statistical schedules is something which is learned 
most satisfactorily by actually making and using them. Nevertheless, 
there are some cautions which are helpful: 

1. Clarity ts essential The entire schedule as wed as each question 
should be as simple and as clear as possible. This is particularly true of 
schedules (sometimes called questionnaires) sent to, or left with, persons* 
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to be filled out at theur convenience. An ambiguous question or a question 
that invites an ambiguous answer produces useless data and involves 
wasted time and money. An organization, in making a study, queried 
some hundreds of parents: “Is your child’s outlook on life broader or nar- 
rower than yours was at the same age?” The investigator presumably 
expected the replies to read “Broader” or “Narrower.” Replies actually 


Int 

Dfl.tA 

z; 

1 

o 

o 

0 

r 



Cnm 

Enon 


Tel 

Rftf _ _ 

Car 

Serv 

Sex 

Col 

Age 

Occ 

A.gr 


FOOD INDUSTRIES 

1. Have you tried frozen foods 

2a. If yes — 2b. If no — 

Were they satisfactory? Yes ( ) No ( ) Why not*^ 


3. What canned foods are most nearly as good as when fresh‘d. 


4. Is it dangerous to leave food overnight in an opened can? Yes( )No( ) 

5. Do you prefer milk in bottles ( ) or <!artons ( )? 

6. How do you prefer to buy fresh coffee — 


In sealed tins 

• • . ( 

) 

In dated packages .... 

.... ( 

) 

Ground when bought 

.... ( 

) 

In the bean for home grinding 

( 

) 


Schedtile Used by Hartwell, Jobson, andKibbee (Public Relations Counsel) to Obtain 
Data for a Food Industry Trade Journal. Data are collected entirely by interviewers. 
Entries following the abbreviations at the top of the schedule are to assist in selecting 
a sample which will properly represent aU relevant strata of the population. The 
meanings of the abbreviations are as follows* Int, name of mterviewer; Com, size of 
community; Econ, economic group; Hse, type of house; Tel, telephone; Ref, refrigei- 
ator; Serv, servants; CoJ, color or race; Occ, occupation: Agr, agreeableness of person 
interviewed. 
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received, however, were frequently ‘‘Yes,’^ doubt it,’’ and “I 

hope so” — ^none of which had any meaning. Furthermore the question 
is so worded as not to allow for the fact that there may be two or more 
children in the family. The inquiry concerning marital condition when 
put ^^Married or Single?” is open to two objections: (1) Either a “Yes” or 
“No” answer is meaningless; (2) not all persons are included in these two 
categories. One good way of asking this question is to say: 

Check whether: 

Single. 

Married 

Widowed 

Divorced 

Separated. 

The investigator should not be satisfied merely with wording his questions 
so that they can be understood; he should draft them so carefully that they 
cannot be misunderstood. 

2. Not all questions can be accurately answered. No matter how 
clearly a question is stated, there are some sorts of inquiries which are apt 
to elicit unsatisfactory returns. The schedule of the United States Census 
of Population asks for age at last birthday for each person enumerated. 
Reference to the 1930 Population Volume II, Tables 20 and 21 and chart 
on page 571, shows a peculiar distribution of the population by one year age 
groups. Beginning with age 30 and continuing through age 80, there 
are definite concentrations of persons on every age ending in 0 or 5. For 
example, there are more people reported as 35 than there are as either 34 
or 36. There are secondary concentrations upon certain ages which are 
a multiple of 2, most noticeable when these even numbers of years are 
not adjacent to an age ending in 5. Thus there are concentrations at 28, 
32, 38, 42, 48 and so on through 72. Furthermore, there seem to be too 
many males aged 21 and too many females aged 18. In its instructions 
to enumerators the Census warns that many persons will report age in 
round numbers and says, “Therefore, when an age ending in ^0’ or '5’ is 
reported, you should inquire whether it is the exact age. If, however, it 
is impossible to get the exact age, enter the approximate age rather than 
return the age as unknown.” 

The rounding of ages is not peculiar to the Census. Some of the factors 
believed to lead to reporting ages in round numbers are; (1) The informa- 
tion concerning an in^vidual is not necessarily furnished to the enumerator 
by the person himself; it is often given by a relative, friend, landlady, or 
other person, and some of these informants cannot have exact information. 
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(2) When ages are intentionally misstated, as they occasionally are, there 
is reason for believing that they are often rounded. (3) Some persons 
are careless, or occasionally a person of low intelligence may always think 
in terms of round numbers. The Census notes that the rounding is most 
noticeable for those classes of the population in which the proportion of 
illiterates is greatest. (4) A few persons do not know their exact ages. 
(5) There may be carelessness on the part of enumerators. Some im- 
provement in the accuracy of reporting ages may be had by asking date of 
birth instead of, or in addition to, age. It should be recognized, however, 
that the posing of a more exact question does not produce better data 
when exact knowledge is lacking, as in the case of a landlady reporting 
for her roomers. Furthermore, the matter of the expense involved in 
asking this additional question might more than offset the expected in- 
crease in accuracy. When age Is of primary importance, as in the case 
of application for insurance, date of birth is usually asked and may be 
verified by documentary evidence. 

Another interesting example of thinking in terms of round numbers 
occurred in the case of a contest sponsored by a motion picture theatre. 
An irregular-shaped glass jar was filled with cranberries and six prizes were 
offered to the patrons who guessed most nearly the correct number of 
cranberries in the jar. An analysis of the 1,996 guesses showed that there 
were 1,465 which ended in 0 or 5. 

3. Certain types of questions should he avoided. When the prosecuting 
attorney asked the alleged wife beater, ^^Have you stopped beating your 
wife?^^ he attempted to put the defendant, whether he replied ^^Yes” or 
in the position of admitting that he had beaten his wife. In a 
scientific investigation we should scrupulously avoid leading questions. 
When asking the reason for unemployment in 1932, an enumerator would 
have been suggesting the answer if he had said, “I suppose you are unem- 
ployed because of the depression?^^ Rather he should have inquired, 
“What is the reason you are unemployed?^’ 

Questions which are unduly inquisitive or which are liable to offend 
should likewise be avoided. In a study of social workers each married 
woman was asked whether or not she lived with her husband. The in- 
quiry was injudicious, aroused rcv^entment, and would hardly have been 
productive of useful data if it had been answered by all the persons queried. 
Questions concerning personal matters (such as income) should be handled 
with tact — ^perhaps asked at the close of the interview after the cooperation 
of the informant has been secured. Sometimes it is better not to ask such 
a question but to infer the general income level from knowing if there is a. 
telephone in the home; if the home is owned, and its apparent value; the 
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wage earner^s occupation; make of car(s) driven, if any; servants employed, 
if any; etc. 

4. Answers should be objective and capable of tabulation. When making 
factual studies, questions should be so designed that objective answers 
will be forthcoming. Instead of asking the condition of a building and 
allowing the enumerator to state the condition in his own words, the Real 
Property Inventory (United States Department of Commerce, 1934) asked 
if a structure was in good condition, needed minor repairs, needed structural 
repairs, or was unfit for use. Although the answers to these questions are 
not completely objective, at least they are capable of being readily tabu^ 
lated. 

5. Instructions and definitions should be concise. The enumerator and 
informant should never be in doubt as to what information is desired and 
what terms or units are to be used. When inquiring as to the employment 
status of an individual (whether full time, part time, or unemployed), our 
inquiry must be as of some specific time. Thus the 1930 Unemployment 
Census of the United States considered as unemployed those gainful 
workers who were not at work on the day preceding the visit of the enumer- 
ator (or on the last previous work day, in case that day was not a regular 
working day for the person enumerated). The 1932 Buffalo unemploy- 
ment study asked employment status as of November 4, 1932. 

If information is desired as to the exact situation of a part time worker, 
it must be made clear whether the desired answer should be: (1) hours per 
day; (2) hours (or days) per week; or (3) fraction of usual full time. 

The units used in a study should be clearly understood by both the 
enumerator and the informant. If we are collecting data on coal produc- 
tion or consumption, we should state clearly whether we are referring to 
short or long tons. If we desire information as to the number of rooms 
in houses, it should be clearly understood whether or not bathrooms, 
kitchenettes, powder rooms, dressing rooms, and the like are to be counted 
as rooms. 

6. Arrangement of questions should be carefully planned. Not only 
must the questions be well arranged on the schedule form to allow proper 
space for answers, but the arrangement of the questions should be such as 
to facilitate the answering of each question in turn. If a logical flow of 
thought is involved, it should be followed in the arrangement of questions. 
Questions should not skip back and forth from one topic to another. 

After a schedule has been drafted, the desirable procedure is to try it out 
with a group, discover its shortcomings, and then revise it in the light of 
the tryout. If there is not time for a tryout, ask some competent investi- 
gators to go over it and make suggestions for its improvement. When 
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the final form of the schedule has been decided upon, careful instructions 
for filling it out should be prepared. If the schedules are to be mailed to 
the persons furnishing information, these directions should be as clear 
and concise as possible. If enumerators are used, the instructions to the 
enumerators should be complete in order to cover as many as possible of 
the situations which may occur in their work 

Selecting the sample. As pointed out previously, the United States 
Population Census is a complete enumeration of the people of the United 
States. By this we do not mean that not even one individual is omitted, 
because it is, of course, true that a few persons are not enumerated. How- 
ever, the intent is to include everyone, and the very few who are not in- 
cluded are likely to be those in extremely out-of-the-way places, traveling 
men with no permanent abode, tramps, etc. Similarly the Census of 
Agriculture undertakes to include all farms in the United States, and the 
Census of Unemployment (1930) attempted to embrace all unemployed 
persons. 

Sometimes it is not practicable or necessary to essay a complete enumer- 
ation. We may be satisfied to have an almost complete coverage. Thus 
the United States Census of Manufactures does not undertake to include 
all manufacturing establishments but eliminates the extremely small 
concerns. The quinquennial manufacturing censuses from 1904 to 1919 
included factories having products valued at $500 or more during the 
calendar year. Beginning in 1921, censuses of manufacturing were taken 
every two years and, at these biennial censuses, data were collected from 
only those establishments having products valued at $5,000 or more. 
The importance of this exclusion was studied in 1921 (when certain general 
data were obtained from the smaller establishments) and it appeared that, 
while the firms having products valued at $500 to $5,000 constituted 21 
per cent of the total number of establishments, they employed only six- 
tenths of one per cent of the total number of wage earners and had an 
output of but three-tenths of one per cent of the total value of products. 
Thus the enumeration was quite incomplete in respect to number of estab- 
lishments, but virtually all-inclusive in regard to number of wage earners 
and value of products. 

Although the Census of Agriculture undertakes to include all farms, it 
does not, therefore, include all land used for agricultural purposes. A 
farm, as defined by the census, is: 'All the land which is directly farmed 
by one person conducting agricultural operations either by his own labor 
or with the assistance of members of his household or hired employees 
But, enumerators are warned, “Do not report as a 'farm' any tract of land 
of less than 3 acres, Loiless agricultural products to the value of $250 or 
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more were produced on such tract in 1929.^^ ^ It is thus apparent that 
very small plots used for agricultural purposes are not included, but the 
coverage of agricultural lands is, nevertheless, virtually complete. 

It may be too expensive or too time-consuming to attempt either a com- 
plete or nearly complete coverage in a statistical study. Furthermore, to 
arrive at valid conclusions, it may not be necessary to enumerate all or 
nearly all of a population. We may study a sample drawn from the 
larger population and, if that sample is adequately representative of the 
population, we should be able to arrive at valid conclusions. There are 
various ways in which a sample may be selected from a population. No 
matter which of these is employed, it must be remembered that the cardinal 
purpose is to obtain a representative sample, that is, one which contains 
all elements in the same proportion as in the population from which it is 
drawn. In short, it is not merely a matter of grabbing any 2, 5, 10, or 20 
per cent sample of a population, but of selecting that sample in such a way 
that it will be as representative as possible. 

1. Random sample. One method of selecting the items to comprise a 
sample consists of drawing them at random. More exactly, the items 
should be drawn independently so that each item will have an equal chance 
of bemg selected. When this situation holds, it is more likely that the 
sample ■will have the different elements in the same proportion that they 
exist in the population, than that these elements will be present in any other 
proportion. Such a situation may be rather exactly realized in drawing 
marbles from a large container (which holds, say, several thousand marbles, 
I of which are white, | black, and | red) if we draw one marble at a time, re- 
placing it after each draw and thoroughly mixing the marbles before each 
draw. It may be closely realized as we draw samples of screw^s, nails, 
bricks, wire, or other products from the production stream of a factory. 
It may be approximately realized in a community study, but only approxi- 
mately so because of the difficulty of setting up a selection procedure. If 
the selection of a sample is to be based upon individuals or households, it 
is necessary to have a listing of those persons or households so that the 
sample may be selected. Sometimes a city directory for individuals, or 
the list of subscribers for electricity, gas, or water fqr households, may serve 
as a basis, and every tenth or twentieth name (depending upon the size 
of the sample desired) may be selected. Lists such as these are obviously 
incomplete, and sometimes selectively so, in that certain categories of the 
population may be excluded and others included. The list of subscribers 


® From Fiftemth Cmsus of the United States, 19S0, Instructions to Enumerators, 
pp. 52-53. 
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for gas and electricity does not include the poorest homes in a city and 
will, therefore, not be an adequate basis to use for selecting a sample if we 
are studying unemploynnent or, in fact, any other topic which requires 
proper representation of the economic levels of the population. 

In economic and social studies it is difficult to apply the mechanical 
methods necessary to obtain a random sample. Furthermore, our prob- 
lem is complicated in that the units (persons, households, etc.) are dis- 
similar. When selecting marbles from a container, we do not care which 
white, black, or red marble we draw. We have units that differ from one 
another only in respect to color; they are made of the same material, are 
essentially the same size, shape, and weight, and have similar surfaces. 
When our units are people, we find that they differ in respect to sex, age, 
race, occupation, employment status, economic status, religion, etc. 
About all that they have in common is that they are human beings and live 
in the same community. Such differences are important and need to be 
kept in mind when a sample is selected. What has just been said should 
not be construed as a condemnation of the random sample; rather it is 
an attempt to point out the difficulty^ of obtaining a random sample in 
particular instances, primarily when making a community study. 

2. Stratified sample, A stratified sample differs from a random sample 
in that the population is broken into subgroups or strata before the sample 
is drawn. A random sample is then taken from each stratum. Usually 
the size of the sample from each stratum is proportional to the size of the 
stratum in the population. When a population is composed of relatively 
homogeneous units (such as the marbles referred to before), a random 
sample may be quite satisfactory. However, it frequently happens that 
a population is composed of heterogeneous units which, nevertheless, may 
be broken into rather uniform strata. The purchaser of a box of berries 
recognizes the existence of stratification when she turns out the contents 
to exanoine the bottom as well as the top layers. Here only two strata 
are considered. The purchaser of large quantities of coal will be apt to 
check his purchase in respect to lump size, heat content, ash content, etc. 
He is not satisfied with taking a few shovels full of coal from the top of a 
carload. The coal was probably loaded without any attempt to put small 
pieces on the bottom, but the shaking of the car on its journey from the 
mine tends to cause the smaller pieces to find their way to the bottom. 
Even though such a readjustment had not taken place, the careful buyer 


^ Sometimes Tippett^s random numbers may be useful in selecting a sample if num- 
bers can be assigned to the items m the population. See L. H. C. Tippett, The Methods 
of Staiistics (2nd edition), p. 68, Williams and Norgate, London, 1937; and Tracts for 
Computers, XV, ‘‘Random Sampling Numbers/' by L, H. C. Tippett, Cambridge Uni- 
versity Press, 1927. 
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would select his sample from the middle and the ends and at various levels 
from top to bottom of the load in order to be sure of getting a sample as 
nearly as possible representative of the entire load. 

The recognition of the existence of strata and the selection of random 
samples from these strata (rather than from the population as a whole) 
introduce added elements of control into the selection of the sample and 
give us greater assurance of representativeness, which increases as the 
number of strata increase. It will be shown in connection with Chaptel 
XII that a stratified sample is more reliable than a random sample of the 
same size from the same population. From this it follows that the same 
reliability may be had from a smaller stratified sample. There is some 
danger that investigators, having an excessive feeling of security in the 
stratified sample, may depend too greatly upon the magic of stratification 
and use samples which are too small to give statistically reliable results. 
This can be guarded against by an intelligent use of the method and of the 
reliability formula given in Chapter XII. An extremely important point, 
which is often overlooked, is that the strata must be ones which are related 
to the topic being studied. If we are making a health study of male 
students in a college, we might recognize such strata as those who do or do 
not live at home; those who are totally, partially, or not at all self-support- 
ing; those who do or do not take regular exercise; those who do or do not 
smoke; etc. However, there are other strata which clearly have no bearing 
on the problem. To take an extreme illustration, we might recognize 
such strata as those who habitually wear caps or hats, those who prefer 
single or double breasted coats, or any other categories which are not 
related to health. 

The principle of the stratified sample is used by the American Institute 
of Public Opinion^ in its surveys conducted throughout the country. 
The Institute refers to the method in its news releases as ‘^scientific sam- 
pling.^' Seeking to measure public opinion not only in respect to elections 
but also on public questions of many sorts, the Institute at first used mail 
ballots, which were supplemented by direct interviews. Later the mail 
ballots were discontinued, and all opinions are now obtained through 
personal interviews. The Institute uses more than 600 field men in cities 
and rural areas throughout the nation. Voters are interviewed in the 
home, on the street, in offices, and on farms. To insure a representative 
sample, the Institute undertakes to select from the various strata a repre« 
sentative cross-section of the voters. Thus the sample (which appears 


® This brief description of the procedure used by the American Institute of Public 
Opinion is based largely on a booklet issued by the Institute and entitled The Ne^w 
Sdmce of Public Ovinion Measuremeni. 
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to consist of more than 60 strata) must contain the proper proportion of: 

(1) Voters from each state (48 strata). 

(2) Men and women (2 strata). 

(3) Farm voters, voters in' villages of 2,500 population or less, and voters 
in urban communities divided into four categories according to 
population (6 strata), 

(4) Voters of all age groups (presumably several strata). 

(5) Voters of above-average and below-average incomes, as well as per- 
sons on relief (3 strata) 

(6) Democrats, Republicans, and members of other political parties, 
as indicated by how each person voted at the preceding presidential 
election (at least 3 strata). 

The character of the cross-section of the sample is felt to be of more 
importance than the number of persons included. Straw votes and other 
sampling studies which are substantially wrong are usually incorrect rather 
because the persons reached are not representative of the entire group from 
which they were drawn than because the sample was too small. The fail- 
ure of the Literary Digest’s attempt to forecast correctly the 1936 election 
was due to the fact that its more than 2,300,000 ballots were not a repre- 
sentative cross-section. The voters were drawn primarily from lists of 
automobile owners and telephone subscribers. The Institute uses a 
sample of 3,000 to 50,000 or more cases, depending upon the problem being 
studied. "When reporting the attitude of voters (or occasionally of some 
other special group) on a public question, the Institute does not ordinarily 
state the number of ballots obtained on that particular question. Pre- 
sumably this practice is followed because the ordinary newspaper reader 
would think that a sample of a few thousand could hardly be depended 
upon to gauge the attitude of a nation, and he would be right if it were not 
for the careful application of the principle of the stratified sample. 

The surveys conducted by the magazine Fortune also make use of 
stratified samples.® These studies are samples ^ ^balanced by geography, 
by sex, by size of community, by income group, by color, by age, and by 
occupations^ in order to arrive at a true cross-section. This list includes 
two classifications (color and occupation) not used by the Institute, while 
the Institute considers how each person voted at the preceding presidential 
election, which is not included by Fortune, 

Instead of selecting a proportionally stratified sample, it is sometimes 
easier to utilize a sample obtained by some other method and to adjust it 
so that each stratum will be properly represented. If all strata are repre- 
sented, but not in the same proportions as in the population, weights may 


"'“The Fortune Quarterly Survey,'^ Fortune, July 1938. 
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be applied to the portions of the sample coming from each stratum, in 
order to establish these proportions. Such a procedure, however, would 
not usually be so satisfactory as taking a stratified sample; moreover, it 
would presuppose a knowledge of the strata in the population and their 
importance, the same as for a stratified sample. A similar application of 
weights may be used when a sample, intended to be proportionally strati- 
fied, does not prove to have the proper proportions from each stratum. 

3. Other ty'pes of samples. Sometimes a sample is selected by design 
or, as it is often termed, the selection is purposive. When selecting such a 
sample, the investigator sets out to make his sample agree in one or more 
respects with the population. Of course, this procedure cannot be fol- 
lowed unless the characteristics of the population are known. In a study 
of a group of wage earners a sample may be picked so that the average 
weekly earnings of those included in the sample will be the same as the 
average weekly earnings of the entire group of wage earners from which 
the sample was chosen. The sample might also be so chosen that it will 
agree with the larger group in respect to the average size of the wage 
earners^ families. Additional controls could, of course, be used; if they 
are relevant to the problem which is being studied, the greater the number 
of respects in which the sample agrees with the population, the more 
thoroughly representative is the sample. 

A stratified purposive sample may be employed if we first divide the 
population into strata and then endeavor to make the sample drawn from 
each stratum agree in one or more respects with all the items in that 
stratum, as well as to make the sample contain the proper number of items 
from each stratum. 

Sometimes a sample is taken in a more or less haphazard fashion. Or, 
the investigator may include the data which are convenient or readily 
available, after which he will trustingly announce that the sample so taken 
is doubtless representative of the population which he is studying. For 
example, an investigator, who had ascertained that just under 2,500,000 
children, eligible to be enrolled in high school, were not enrolled, desired 
to estimate how many of these 2,500,000 left school because of economic 
pressure. He managed to locate 16 acceptable studies concerning the rea- 
sons why students left school. These studies each included 53 to 274 
children, a total of 2,525. The studies were made in schools in 13 different 
states. Negroes were studied in one instance. There were no figures from 
New York, Massachusetts, Illinois, Michigan, Wisconsin, Texas, and 
certain other populous states. Yet, because the geographical distribution 
was diverse and because large city, small city, and rural children were 
included, the investigator concluded: “The sample seems sufficiently 
representative of the various elements of the population to serve as the 
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basis for estimation of the whole group.” This may or may not have been 
true. The sample was neither random, stratified, nor purposive; it merely 
included what was available. 

As will be shown in connection with Chapter XII, the larger the sample 
(whether random or stratified), the more confidence we can place in con- 
clusions drawn from the sample. It will also be shown that the greater 
the diversity there is in the population, the less reliability we can repose 
in samples of the same size. Mere size, of course, does not assure repre- 
sentativeness in a sample. A small random or stratified sample is apt 
to be much superior to a larger but badly selected sample. Sometimes a 
test of stability is made to determine when a sample is large enough. For 
example, a sample of 1000 may be selected from a group of voters, and 57.3 
per cent may indicate they intend to vote for a certain candidate. Another 
1000 may be chosen, and the two groups combined may show 56.9 per cent. 
Adding another 1000 may change the percentage to 56.8, and another 1000 
(4000 in all) may leave the proportion unchanged, at 56.8. From this test, 
3000 or 4000 would seem to be an adequate sample from the standpoint of 
size. However, the test of stability tests only stability and not repre- 
sentativeness. The fact that a percentage persists essentially unchanged 
means merely that we are continuing to get about the same result as 
before. Conceivably the first sample of 1000 could have been decidedly 
unrepresentative (say, from only the poorer sections of the voting popula- 
tion), and each succeeding sample similarly unrepresentative. 

In selecting a sample it is important that bias be avoided. Bias does 
not mean the personal bias of the investigator which leads him to deliber- 
ately select his sample in order to show the results he desires. That is 
intellectual dishonesty. Neither does it mean that the persons answering 
the questions on the schedule are biased. The avoidance of bias involves, 
fibrst, that there shall be no selective factor present in the drawing of the 
sample and, second, that there shall be no selective factor present when 
schedules are returned from those persons included in the sample. In the 
case of the Literary Digest 1936 straw vote, a selective factor was present 
because the basic lists from which the sample was selected did not include 
the lower economic levels of the population. Sometimes the basic list 
may be complete, but the method of selecting the sample may introduce 
bias. Thus a selection from an alphabetical list of names may be unsatis- 
factory because of nationality differences in the alphabetical distribution 
of family names. Such a bias may arise if sections of the list are chosen; 
it is not likely if (say) every tenth name is taken. 

The second t^ype of selective factor is frequently encountered if the 
questionnaire method of collection is used. T^^en schedules are sent out 
by mail, an investigator never expects that all of them will be returned 
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Since only part of the inquiries are answered, how can he be sure that those 
who did answer are representative of all those to whom schedules were 
sent? Often he cannot be sure; sometimes it is obvious that they are not 
representative. An alumni association sent out 363 inquiries to graduates, 
asking each to report (anonymously) his income for the preceding year. 
Replies' were received from 133. It is quite likely that a selective factor 
was present in these returns. Alumni who were out of work or who had 
very low incomes probably did not reply. This assumption is borne out 
by the data, which show an almost complete absence of incomes below 
$1,500, although the study was made in a depression year. Conclusions 
based upon biased samples are, obviously, not only useless but misleading. 

Using the schedules to collect the data. When agents or enumerators 
take the schedules to the persons who are to furnish the information, the 
enumerators may explain the purpose of the investigation and solicit co- 
operation. Each question can be clearly explained as it is asked. Obvi- 
ously, enumerators must be carefully instructed before they begin their 
work. Occasionally they are required to study the schedule and primed 
instructions, and then to take an examination. Enumerators should be 
persons of unquestioned integrity and should also be patient, polite, and 
tactful; Many a person resents being bothered to supply statistical (or 
other) information; some are reluctant; some refuse. The enumerator 
should plan his interviews to consume as little time as possible, and should 
bend every effort to get the desired information if it is feasible to do so. In 
some instances the work of the enumerator may be facilitated if a letter 
of explanation precedes the visit. Sometimes enumerators conduct inter- 
views and fill in the schedules afterward. This is done on the theory that 
people feel more free to talk if the remarks are not being WTitten down at 
the time. It is believed, however, that this is an undesirable procedure, 
especially when there are a number of facts to be remembered and later 
recorded. Enumerators should carry credentials in order that the persons 
visited may be satisfied as to the official connection of the visitor. Even 
though an enumerator makes his request for information as tactfully as 
possible, he may sometimes meet with a refusal. Frequently another 
visitor with a different approach may have better luck. It is sometimes 
a good plan to have one especially qualified worker who will follow up the 
more difficult cases. 

Sometimes an enumerator may encounter a person who is too willing 
to cooperate and who wants to talk at great length about the study. In 
such a situation good terminal facilities are an asset- Carl Crow states^ 

^Carl Crow, Four Hundred Million .CmtomerSf pp 132-133, Harper and Brothers. 
New York. 1937. 
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that Chinese, when asked certain types of questions, are apt to give an- 
swers which they think will please the questioner. If an English investi- 
gating commission asks young Chinese w^here they want to go to school 
they are likely to reply, “England.^' The same author tells® of an investiga- 
tion made in Amoy, where, because of a lack of proper death registration, 
the number of persons dying was estimated from figures of the number of 
coffins made. The figures of coffin production mounted, showing the de- 
velopment of an epidemic; but, after the epidemic was definitely known to 
have declined, the figures of coffins made remained high. Upon close in- 
quiry it developed that the coffin manufacturers had continued to report 
peak production of coffins so that the agent of the health officials would not 
lose his job. They did not want to ^Treak his rice bowl.^^ 

Sending schedules by mail rather than using enumerators is, at the out- 
set, a less expensive method of collecting data. There is also the added 
advantage that the person supplying the information can fill out the 
form at his convenience, instead of being disturbed by the enumerator 
perhaps at a busy or inconvenient time. Furthermore, confidential in- 
formation may be given in a questionnaire, which the informant would 
hesitate to divulge to an enumerator, provided of course that the informant 
is sure his identity is unknown. On the other hand, a large proportion of 
persons fail to reply to a mail inquiry (particularly certain classes of 
persons), and considerable follow-up work may be necessary. There is 
also great danger that the informant will not understand the questions, 
or will knowingly or otherwise make incorrect answers. Not only must 
clear, concise directions be sent with the schedule, but also a brief letter 
explaining the purpose of the inquiry and requesting cooperation. An 
addressed and stamped (or business reply) envelope should be included. 
An air mail business reply envelope (or card) is occasionally used by 
investigators with the hope that it will result in more and quicker re- 
sponses. ^Vhen follow-up work is necessary the persons who have not 
yet returned their forms may be sent courteous personal letters remind- 
ing them of the inquiry and again requesting cooperation. When ap- 
propriate, the follow-up may be by means of air mail letters, special 
delivery letters, registered letters (to be sure the communication has been 
delivered), telegrams, or telephone calls. Of course, the investigator 
should not make a nuisance of himself; he should not be too insistent 
When only part of the schedules are finally received, it is necessary to 
examine the situation carefully to be sure that no selective factor has been 
present. Or, if a selective factor appears to be present, it may be necessary 
to conduct a supplementary in restigation to remedy the situation. 


8 Ibid,, pp. 252-253. 
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Editing the schedules. After the filled-out schedules are received^ a 
certain amount of preparatory work is necessary before the data are in 
shape to be tabulated. The editorial tasks are varied. In the case of a 
small study one editor may do the entire work. In a larger study difierent 
phases of the editing may be portioned out among a number of editors. 

1 . Computations, It is usually better not to ask enumerators or persons 
supplying information to make any computations. Thus, if information 
has been obtained concerning the number of rooms in a home and the 
number of members in the household, the editor may compute the ratio 
of persons per room, to give some idea of crowding. If data have been 
collected concerning the time lost through non-compensated accidents 
and also of daily wages for each of a number of workers, the editor may 
compute for each case the income lost because of accidents. 

2. Coding, Tabulation is frequently facilitated by coding. When 
machine tabulation (to be discussed shortly) is used, all entries on a 
schedule are reduced to a numerical code. Even when tabulation is 
manual, it may still be easier to look for a code mark — letters, numbers, 
or combination of letters and numbers — instead of attempting to read the 
original entry. The work of the tabulator may be further facilitated by 
the fact that the editor writes, or should write, legibly and uses a distinctive 
color, often red. 

The Buffalo unemployment schedule on page 36 is shown edited 
according to a numerical code. Every entry is shown numerically coded 
(except those already expressed as numbers) in order to facilitate tabula- 
tion by mechanical means. The code scheme for industries (shown in 
column 6 of the schedule) ran as follows: 

10. Professional 

20. Clerical (not otherwise speci&ed) 

30. Domestic and personal service 

40. Government employees (other than teachers) 

Trade and Transportation 

50. Eetail and wholesale trade 

51. Telephone and telegraph 

52. Railway, express, gas, electric light 

53. Water transportation 

54. Bank and brokerage 

55. Insurance and real estate 

56. Other 

Manufacturing and Mechanical Pursuits 

60. Bmldhag trades, contractors 

61. Building trades, wage earners 



Territory 

VlSltor_, 
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62. Clay, glass, and stone products 

63. Food and kindred products 

64. Iron, steel, and their products 

65. Metal products, other than iron and steel 

66. Paper, printing and publishing 

67. Wearing apparel and textiles 

68. Automobiles, parts, and tires 

69. Lumber and furniture 

70. Aeroplanes 

71. Other manufacturing and mechanical pursuits 

75. Labor (not otherwise specified) 

80. Self-employed (other than 10 or 60) 

90. Miscellaneous employments not classified above 

00. Not reported 

3. Deciphering, The handwriting of an enumerator or of an informant 
may occasionally be difficult to read. This is especially true when an 
enumerator makes entries on a schedule while he is outdoors in the rain 
or snow. Deciphering such copy is the editor’s task; he not only saves 
time for the tabulator, but also insures accurate results. If entries are 
literally unreadable, the schedule may have to be referred back to the 
enumerator or the person who sent in the information. 

4. Checking entries. The editor may look over the schedules for in- 
consistencies. Entries of age and date of birth may disagree. Something 
is probably awry if an individual reported as aged 8 is also shown to be 
married. Similarly, a mistake has probably (though not necessarily) been 
made if a woman is reported working full time as a blacksmith. Such 
entries must be verified if they are to be used. 

5. Examining for completeness. The editor must also scrutinize the 

schedule to see if any entries are missing or incomplete. If the missing 
information is important, the schedule must be referred back to the enu- 
merator or to the informant. Otherwise, the editor writes (not 

reported) or a similar entry in place of the missing information. 

Tabulating the data. After the schedules are edited, the data must be 
organized before finished tables and charts can be made. The following 
discussion treats of three methods that may be used. 

1. The score or tally sheet. For purposes of illustration we shall assume 
that we want to show for the Buffalo study all males who were able and 
willing to work, classified by industry and by employment status. To 
simplify our illustration we shall consider employment status divided into 
three categories: employed full time, employed part time, and imemployed 
We shall not undertake, at this point, to subdivide part time employment. 
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or to classify unemployment according to duration or cause. The ac- 
companying score sheet, or tally sheet, shows how this information could 
be assembled from a number of edited schedule cards. Note that it is 
more convenient and also saves space to use code numbers for the indus- 
tries. As pointed out earlier, the numerical coding of the schedule card 
is shown because it is needed when mechanical tabulation is to be used. 


AREA„_-/. SCORED 

DlSTRlCTS->/rjC-_. CHECKED 


INDUSTRY AND EMPLOYMENT STATUS, 1932 

MALE, ABLE AND WILLI NG TO WORK 
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For hand tabulation (either by means of the tally sheet or by hand sort- 
ing, which is described in the following paragraph), we should probably 
code only the occupation and the reason for unemployment. Observe 
that the score marks are arranged in groups of five, four vertical and a 
diagonal. This facilitates counting. The data from the schedules are 
scored and then checked, and the totals of the tallies are entered. Since 
the tally sheet shown is for but one district, it is necessary to combine 
the results from a number of such tally sheets to arrive at the desired 
figures for the entire study. The resulting table is shown as Table 2. 

TABLE 2 


Employment Status of A.ll Males Enumeeatbd Who Were Able and Willing 
TO Work, Buffalo Unemployment Survey, 1932 


Industry Group 

Employed 
full time 

Employed 
part time 

Unem- 

ployed 

Total 

Professional 

197 

18 

19 

234 

Clerical (not otherwise specified) 

1 

1 

36 

38 

Domestic and personal service 

328 

82 

148 

558 

Government employees (other than 





teachers) 

636 

192 

249 

1,077 

Trade and transportation . 

1,843 

734 

883 

3,460 

Retail and wholesale trade 

762 

163 

296 

1,221 

Telephone and telegraph . 

34 

20 

24 

78 

Railway, express, gas, electric hght 

687 

470 

424 

1,581 

Water transportation . . 

42 

15 

31 

88 

Bank and brokerage 

99 

6 

20 

125 

Insurance and real estate. 

99 

8 

22 

129 

Other 

120 

52 

66 

238 

Manufacturing and mechanical pursuits 

1,590 

1,670 

2,319 

5,579 

Building trades, contractors 

87 

115 

177 

379 

Buildmg trades, wage earners 

103 

123 

435 

1 661 

Clay, glass, and stone products 

17 

27 

37 

! 81 

Food and kindred products . . 

338 

100 

131 

569 

Iron, steel, and their products 

199 

600 

538 

1,337 

Metal products, other than iron and 





steel ... 

24 

76 

71 

171 

Paper, printing, and p'jl)li'hj’''g 

117 

69 

50 

236 

Wearmg apparel and Toxub 

99 

69 

82 

250 

Automobiles, parts, and tires 

212 

224 

435 

871 

Lumber and furniture. ... 1 

79 

65 

119 

263 

Aeroplanes 

95 

13 

69 

177 

Other 

220 

189 

175 1 

584 

Labor (not otherwise specified) 

^ 1 

11 

16 

31 

Self-employed 

653 

86 

122 

861 

Miscellaneous . . 

10 

1 

111 

122 

Total, males 

5,262 

2,795 

3,903 

11,960 


Sc’orce: Predcnct 2 Crcxton., Unemplsyrr^nt in j^Jovemher 1933^ p 41, Special N<i 

179, Division of Statistics and Information, New York State DsDartment of Labor. 
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The tally sheet is a useful device for organizing information from a small 
study. It is apparent, however, that the tally sheet becomes cumber- 
some if there are many schedules to be scored or if it is desired to subdivide 
classifications. For example, if we desire to show how many men having 
part time work were employed less than | time, J but less than | time, | 
but less than | time, and | time but less than full time, it is necessary to 
divide the part time column of our tally sheet into five columns, four to 
accommodate the categories Just mentioned and a fifth to take care of the 
^Traction not reported” group. If we wish to show duration of unem- 
ployiTient or cause of unemployment, we may subdivide the unemployed 
eolunm. However, if we wish to show both duration and cause of imem- 
ployment so that duration may be studied in relation to cause, it is neces- 
sary to set up a new tally sheet, probably listing the causes vertically and 
the classified durations horizontally. 

2. Hand sorting. When a study is fairly small and the schedule forms 
are small enough (and durable enough) to be handled readily, the data 
may be organized by a process of manual sorting. Hand sorting of the 
Buffalo cards is complicated by the fact that there are often several indi- 
viduals on a card. This difficulty is overcome by organizing the data for 
heads of households first, then sorting for entries on line 6, then for those 
on line c, etc. If we want to obtain the same sort of information as in the 
preceding paragraph, we might begin by sorting out the males and fe- 
males. As a matter of fact, if tables are eventually to be made for females, 
it is more rapid to make an initial sort (for line a) into six piles: three for 
males — full time, part time, and unemployed; and three for females — full 
time, part time, and unemployed. The three piles of cards for males may 
then each be sorted into the industry categories. The cards in each 
pile are then counted to obtain the entries for a table similar to Table 2, 
but for male heads of households only. The next step would be to go 
through an identical sorting process for entries on line h of the schedule, 
then for line c, etc. The summation of all of these countings would 
result in Table 2. Details concerning part time employment or duration 
or cause of unemployment necessitate additional sorting. 

3. Mechanical tabulation. Mechanical devices enable the work of 
tabulating a statistical study to be done most expeditiously, provided that 
the study is extensive enough to warrant their use. The use of tabulating 
equipment is recommended when there are a large number of schedules to 
be analyzed or there are numerous entries on each schedule. The process 
consists essentially of the following steps : 

(a) Reducmg all entries on the schedule to a nximerical code. 

(b) Recording these entries on a punch card by punching holes with 
a key punch to represent the code numbers. 
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(c) Sorting the cards by means of an electrical or mechanical® sorter. 

(d) Assembling data from the sorted cards by means of a tabulator. 

On page 42 is shown a punch card and also an enlarged portion of a 
card similar to the ones used for the Buffalo unemployment study. Each 
card was for an individual. The punch card shown here refers to the 
person listed on line a of the schedule shown previously. The first entry 
on the punch card (10437) covers five columns and refers to the number 
of the schedule so that the punch card and the schedule may be compared 
later if desired; furthermore the first digit, 1, tells us that this schedule 
came from area 1 of the study. There were nine areas in all. The next 
entry, using a single column, shows by a 1 that the individual was a head 
of a household (a 2 would indicate that he was not a head) . The balance 
of the punch card is fairly obvious, except for the two columns marked 
^^employment status.^’ For these columns a two-digit code was devised 
which indicated whether the person was working full time or part time 
(and if so, what fraction), or was unemployed (and if so, the reason). 
There is no code number needed for column 11 of the schedule. Since the 
individual referred to on this punch card was working full time, the em- 
ployment status columns are punched 10. Observe that it was necessary 
to use only part of the pimch card for this study. 

After the punch cards have been prepared they are sorted by the sorting 
machine (shown on page 43). If we wish to separate the cards for males 
and females, we set the machine to sort according to the numbers punched 
in column 7 of the punch card. As each card passes through the machine, 
electrical contact is made through the hole and the card is thus routed to 
the compartment which corresponds to its punched number. If the sort 
is for sex, we have cards in compartments 1 and 2 only. If we are sorting 
for age, we first sort for the right-hand digit, thus putting all ages ending 
in 0 in one compartment, all ending in 1 in the next compartment, and so 
on. The cards are then sorted according to the ten^s digit. This sorting 
results in putting 19, 29, 39, 49, 59, etc., at the bottom of each compart- 
ment, after which 18, 28, 38, 48, 58, etc., drop on top of them, and so on- 

When the cards have been sorted according to the categories we desire, 
the tabulation is completed by means of the tabulator (refer to page 43). 
This machine will not only count but also total several items at once and 
then print the results. Suppose we have sorted out ail cards for unem- 
ployed people who were able and willing to work, and have arranged these 
by sex and by duration of unenaplo 3 mient. The tabulator will not only 


® The devices shown and described here may be leased from the International Business 
Machines Corporation, 590 Madison Ave., New York City. Similar machines are 
available from Eemington Band Business sWvice, Inc., 315 4th Ave., New York City. 
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The Key-Punch. 



The Electric Sorter. Cards are placed in the machine at the right and are then sorted 
into the compartments shown. 



The Electric Printing Tabulator. Cards are placed in the machine at the left; results 
are printed on paper at the right. 
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count the number of each sex in each duration category but will also accu- 
mulate the time lost, giving totals as needed. As a matter of fact the 
tabulator will prepare more detailed tables than those described above, 
but this description wiU suffice to give an idea of its operation. 

A somewhat similar but much simpler device is sometimes used for 
small studies. It is known as the Findex^^ and consists of cards in which 
numbered holes are already punched. Facts are recorded by 
punching a slot to connect two adjacent holes, thus: 

These cards are filed in a special cabinet, the front of which 
contains holes which correspond to the holes in the cards. Sup- 
pose that each card represents a workman, and that punching 
as shown above to connect the two holes means that this individual had 
an accident which involved a broken coUar bone. Now, if we want to 
know how many such accidents occurred, we insert a rod through the upper 
hole, turn the case upside down (it is pivoted), and all cards recording 
broken collar bones will drop down about f inch. The cards are then 
locked m that position, the case is righted, and the cards are coimted. 
If we want to know how many cases of broken collar bones occurred to 
skilled workmen, we put a second rod through the appropriate hole so 
that, when the case is inverted, not aU cards recording broken collar 
bones will drop but only those also referring to skilled workers. 

Presentation and analysis. After the data have been organized manu- 
ally or mechanically, the finished statistical tables and charts may be 
drawn up. Statistical tables are discussed m Chapter III, and charts in 
Chapters IV through VI. The analysis of data is treated in the remaining 
chapters of the book. 

Statistical Sources 

As pointed out at the beginning of this chapter, statistical data may 
already exist which will serve the purpose of the investigator. The data 
may or may not have been published. They may have been collected 
by an individual, a business firm, a research organization, a trade associ- 
ation, a newspaper or magazine, a government bureau, etc. Some organi- 
zations, such as the United States Census, issue only data which they 
themselves have collected. Such sources are designated as primary. 
Some publications bring together data originally compiled by others and 
are referred to as secondary sources. The Survey of Current Business, pub- 
lished monthly by the United States Bureau of Foreign and Domestic 
Commerce, is a secondary source as it includes data from numerous govern- 
mental and non-govemmental sources. Obviously it is preferable to make 

10 Available from the Findex Go.. Milwaukee. Wisconsm. 
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use of a primary source whenever possible, but it may often be more con- 
venient to make use of a secondary source. One invaluable secondary 
source of data is the Statistical Abstract of the United States, issued annually 
by the Bureau of Foreign and Domestic Commerce* A number of other 
sources which are available in many libraries are listed in Appendix A, on 
pages 825-828. 

The reasons for preferring a primary source are: 

(1) The secondary source may contain mistakes due to errors in trans- 
scription when the figures were copied from the primary source. 

(2) The primary source frequently includes definitions of terms and 
units used. I'his is an important consideration since intelligent use can 
hardly be made of data unless the user knows exactly what is meant by 
each term or unit employed by the collecting agency. When data are 
taken from several sources, it is particularly important that definitions of 
terms and units be scrutinized. The term “family^’ may sometimes hav<; 
the limited meaning of father, mother, and offspring; sometimes it may be 
used more or less synonymously with “household.” The term “exports’*^ 
may sometimes refer to gross exports (including re-exports) ; sometimes, 
to exports of United States 'merchandise only. Although a measured 
bushel is 2,150.4 cubic inches, a bushel by weight does not represent the 
same number of pounds for all commodities. For example, a bushel of 
green peanuts in the shell weighs 22 pounds, a bushel of oats weighs 32 
poimds, and a bushel of apples weighs 45 pounds; but a bushel of wheat, 
beans, peas, or potatoes weighs 60 poimds. The Statistical Abstract of 
the United States, although a secondary source, includes the necessary defi- 
nitions of units. 

(3) The primary source usually includes a copy of the schedule and a 
fiescription of the procedure used in selecting the sample and in collecting 
‘fhe data; the reader is thus enabled to ascertain how much confidence to 
tepose in the findings of the study. 

(4) A primary source usually shows greater detail. A secondary source 
often omits part of the information or combines categories, such as showing 
counties instead of townships, or states instead of counties. 

Reliability of data. The analyst should not make use of data, from 
either a primary or a secondary source, without assuring himself as to the 
reliability, accuracy, and applicability of the data. There are numerous 
points worthy of consideration here; 

(1) If the enumeration was based on a sample, was the sample repre- 
sentative? Occasionally an investigator may select a sample of house- 
holds at random from the lists of subscribers of a utility company, forget^ 
ting that the very poor may often not be users of gas or electricity and 
sometimes may not even possess a piped water supply. 
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(2) Was the schedule well designed? Were any leading questions or 
ambiguous questions included? 

(3) Was the collecting agency unbiased or did it “have an axe to grind^7 
It is well to remember that bias may enter either consciously or uncon- 
sciously. 

(4) Was a selective factor introduced because of careless enumeration? 
For example, in an unemployment study, canvassers might be careless 
about following up their calls at houses where no one was at home, and 
thus perhaps the data would show a smaller number of employed persons 
than actually existed. 

(5) Were the enumerators capable and properly trained? Incompetent 
or poorly trained enumerators cannot be depended upon to produce useful 
results. 

(6) Was the editing carefully and conscientiously done? Careless 
coding or computing, on the part of editors, may render of little value the 
findings of an otherwise valuable study. 

(7) Was the tabulating (tally sheets, sorting, or mechanical tabulations) 
performed with care and accurately checked? 

(8) In view of the definitions used, the area studied, and the methods of 
procedure, are the data applicable to the problem that is under investi- 
gation? 

It is not always possible to ascertain the quality of work which was 
done by enumerators, editors, and tabulators. Most primary sources, 
however, reproduce a copy of the schedule used and generally give a more 
or less adequate description of the methods and procedures followed. 
Additional information may frequently be had by correspondence. 

When using data over a period of years from a given source, we must be 
sure that definitions of terms have not changed or, if they have changed, 
to make due allowance for the change if it is possible to do so. For ex- 
ample: at the Census of 1910 and 1920, an urban area was defined as an 
incorporated place (including all towns [townships] in Massachusetts, 
Ehode Island, and New Hampshire) having 2,500 or more inhabitants. 
In 1930 the definition was modified first “to include townships and other 
political subdivisions (not incorporated as municipalities, nor containing 
any areas so incorporated) which had a total population of 10,000 or more, 
and a population density of 1,000 or more per square mile.'' This change 
affected 28 places in 5 states. A second modification, affecting New 
Hampshire, Massachusetts, and Rhode Island, included as urban the 
regularly incorporated cities and “only those towns in which there is a 
village or thickly settled area having more than 2,500 inhabitants and 
comprising, either by itself or when combined with other villages within 
the same town, more than 50 per cent of the total population of the town." 
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Comparability of different sonrces. When data are to be drawn from 
two or more sources, the reliability of each source must be considered and, 
in addition, the user must be sure that the data from the different sources 
are comparable. Let us list some of the reasons for lack of comparability: 

(1) Different definitions of terms may have been used. Coal produc- 
tion is given by the United States Bureau of Mines in short tons of 2,000 
pounds, while exports of coal are shown by the Bureau of Foreign and 
Domestic Commerce in long tons of 2,240 pounds. As if these two sorts 
of tons were not sufficiently confusing, it is necessary to be aware of two 
other ‘^tons'^ used in shipping. These are the gross ton and the net (or 
registered) ton, each of which represents 100 cubic feet. Gross tonnage 
is the capacity of the hull plus the -enclosed spaces above deck available 
for cargo, stores, passengers, and crew; whereas net tonnage is the gross 
tonnage less the space occupied by propelling machinery, fuel, crew 
quarters, master’s cabin, and navigation spaces — in other words, approxi- 
mately the space available for cargo and passengers. 

Because of different accounting systems, the term ^'profit’’ may have 
different meanings in different industries. Profit for a railroad may be 
quite different from profit for a department store. In a certain industry, 
carried on almost solely by partnerships, an investigator found that many 
firms showed little or no profit and that great differences were present 
among firms. The partners were frequently paying themselves generous 
salaries, and therefore a new term ^'profit plus partners’ salaries” was used 
for the study! Ages may be reported as of the last birthday; as of the 
nearest birthday; or, in Oriental fashion, as of the next birthday. Com- 
parability of age data is thus affected by the bases of reporting. 

(2) Different methods of computation or estimation may have been 
employed. For example, the methods of estimating population were re- 
sponsible for two different estimates of the July 1, 1935, population of 
Yonkers, N. Y. One organization announced the population to be 144,233, 
while another estimated it as 157,455. The lower estimate assumed that 
Yonkers had grown since 1930 at the same rate as had the United States, 
the growth of the United States being determined by considering the ex- 
cess of births over deaths and figures of net immigration. The second esti- 
mate appears to have been arrived at by assuming that the percentage 
change in the population of Yonkers from 1930 to 1935 was about one-half 
of the percentage change from 1920 to 1930. 

(3) The samples may have been so chosen that the results are not 
comparable. Or, perchance, one study may have been based on a sample, 
whereas the other was a complete enumeration. It is, of course, possible 
to so choose a sample that the results of a study may be forced to fit a 
preconceived idea. 
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(4) Different standards of accuracy may have prevailed with respect to 
enumeration, editing, and tabulating. 

(5) The sources may not be comparable in respect to areas included, or 
in respect to the period of time to which they refer. When the chrono- 
logical difierence is not too great, comparisons may sometimes be made or 
adjustments effected. 

Whether an investigator is using primary or secondary sources, it is 
necessary to keep on the lookout for obvious mistakes and misprints. On 
page 392 of the Statistical Abstract of the United States for 1931, it is shown 
that in Continental United States potential water power amounting to 
38,110,000 horse power is available 90 per cent of the time, while potential 
water power of 9,166,000 horse power is available 50 per cent of the time. 
It is clear that there must be a greater potential horse power available for 
50 per cent of the time than for 90 per cent of the time. Data are given 
for each state and, if these details are added, it appears that 59,166,000 
horse power of potential water power are available 50 per cent of the time. 
Obviously this was a typographical mistake which occurred in printing 
the abstract, or possibly was carried over from the primary source. Such 
an apparent contradiction would be observed at once by the experienced 
user of figures. 
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Metiiods of Presentation 

Four methods of statistical presentation are available. Data may be 
(1) incorporated in a paragraph of text, (2) put into tabular form, (3) 
placed in a semi-tabular arrangement, or (4) expressed graphically. 

Text presentation. Combining figures and text is not a particularly 
effective device, since it is necessary to read, or at least scan, the entire 
paragraph before one can grasp the meaning of the entire set of figures. 
Most persons cannot easily comprehend the data when set forth in this 
manner, and it is especially difficult for the reader to single out individual 
figures. There is the advantage, however, that the writer can direct atten- 
tion to, and thus emphasize, certain figures and can also call attention bo 
comparisons of importance. Following is an example of text presentation: 

The United States Bureau of Foreign and Domestic Commerce pre- 
sented, in the December 1937 Monthly Summary of Foreign Commercej 
data of exports of United States merchandise and of imports for con- 
sumption (not including imports for purposes of re-export), segregated 
into ^^economic classes'’ and for various years. Comparing 1936 and 
1937, the total value of exports v . - -^2 ' ; ^ 1936 and $3,294,916,- 

000 in 1937, while the total value of imports for consumption was 
$2,423,977,000 in 1936 and $3,012,487,000 in 1937. Crude materials 
exported in 1936 amounted to $668,168,000, or 27.6 per cent of the total 
value of exports for that year; and in 1937 were $721,871,000, or 21.9 per 
cent of that year’s total. Imports of crude materials amounted to 
$732,965,000 m 1936 and $973,535,000 in 1937, or respectively 30.2 per 
cent and 32.3 per cent of total imports for consumption in the two years. 
Crude foodstuffs exported in 1936 were valued at $58,144,000, which was 
2.4 per cent of total exports for that year; and $101,742,000, or 3.1 per 
cent of the total, in 1937. Imports of crude foodstuffs for consumption 
were $348,682,000, or 14.4 per cent of the total value of imports for 
consumption in 1936; and $413,345,000, or 13.7 per cent of the total in 

4& 
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1937. Manufactured foodstuffs exported in 1936 came to $143,798,000, 
or 5.9 per cent of the yearns total; and in 1937 were $177,451,000, or 5.4 
per cent of the total. Imports of manufactured foodstuffs for con- 
sumption amounted to $386,240,000, or 15.9 per cent of the total imports 
in 1936; and $440,103,000, or 14.6 of the total in 1937. Semi-manufac- 
tures exported in 1936 were valued at $394,760,000, or 16 3 per cent of 
the total; in 1937 they were $677,254,000, or 20 6 per cent of the year’s 
exports. Imports of semi-manufactures for consumption totaled $490,- 
238,000, or 20.2 per cent of all imports for consumption in 1936; and 
$634,181,000, or 21.1 per cent of the total in 1937. Finished manu- 
factures worth $1,154,099,000, or 47.7 per cent of the total for that year, 
were exported in 1936; and $1,616,598,000 worth, or 49.1 per cent of the 
total, in 1937. Of finished manufactures imported for consumption 
$465,852,000 worth, or 19.2 per cent of all imports for consumption, came 
in during 1936 and $551,323,000, or 18.3 per cent of the total, were re- 
ceived in 1937. 

Tabular presentation. The same data that were included in the preced- 
ing text statement are shown in Table 3. This method of setting forth 
statistical data is usually superior to the use of text. A table with its title 
should be fuUy self-explanatory, although it may frequently be accompa- 

TABLE 3 


Expobts of United States Mbbchandise and Imposts fob Consumption, by Eco- 
nomic Classes, 1936 and 1937 

(Materials imported for purposes of re-export are not included ) 


Economic class 

Value 

(thousands of dollars) 

Per cent of 
total value 


1937 

1936 

1937 : 

1936 

Exports of United States mer- 
chandise, total 

3,294,916 

2,418,969 

100 0 

100 0 

Crude materials 

721,871 

668,168 

21.9 

27.6 

Crude foodstuffs 

101,742 

58,144 

3.1 

2.4 

Manufactured foodstuffs 

177,451 

143,798 

5.4 

5.9 

Semi-manufactures 

677,254 

394,760 

20 6 

16.3 

Finished manufactures 

1,616,598 

1,154,099 

491 

47.7 

Imports for consumption, total. . 

3,012,487 

2,423,977 

100 0 

100.0 

Crude materials 

973,535 

413,345 

732,965 

32.3 

30.2 

Crude foodstuffs . 

348,682 

13.7 

14.4 

Manufactured foodstuffs 

440,103 

386,240 

14 6 

15 9 

Semi-manufactures 

634,181 

490,238 

21.1 

20 2 

Finished manufactures . . 

551,323 

465,852 

18.3 

19.2 


Source: United State® Bureau of Foreign and Domestic Commerce, M<mtUy Summcary of Foreign Com 
merce, December 1937, p. 36. 
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nied by a paragraph of interpretation or a paragraph directing attention t 3 
important figures. 

It is readily seen that the table is much briefer than the text statement, 
since the row and column headings eliminate the necessity of repeating 
explanatory matter. As no text appears with the figures; the presentation 
is more concise. The logical arrangement of items in the stub (the left- 
hand column and its heading) and caption (the headings of the other 
colunms) makes a table clear and easy to read. The use of columns and 
rows for the figures facilitates comparisons. 

In Table 4 the various parts of the table have been slightly separated 
and labeled for identification. A table will have at least the four essen- 

TABLE 4 

Population op the United States, by Geographic 

Divisions, 1930 / ^ ® 


Caption 


Body 


Sotirce 1 Source* Fifteenth Census of the TJntted Sixties, ISSO^ Population 
^ Yolume I, P. 10, 

tials: title, stub, caption, and body. There may also be present a prefa- 
tory note (see Table 3) and one or more footnotes (see Table 7). If the 
figures in the table are not original, a source note is also included, some- 
times with the prefatory note but usually below the table and below the 
footnotes to the table, if any are present. 

Semi-tabular presentation. When only a few figures are to be used in a 
discussion, the text may be broken and the data listed as follows:^ 

. . . the employer “must gradually instruct his apprentice in the various 
operations of the trade to finally produce a competent worker.’ ' During 
the 2-year training period the apprentice must attend the courses in 
hygiene and related work at the university and obtain a certificate. 

The apprentice wage scale for a week’s work is- 


Aftei 6 months at the school $ 7.50 

After 12 months 10.00 

After 18 months 12.00 


1 Monthly Labor Bedew, August 1935, p. 409. “Begulation of Beauty Shops under 
Quebec Labor Lawi.” 


Division 

Population 

Per cent 
of total 

United States 

122,775,046 

100.0 

New England 

8,166,341 

6.7 

Middle Atlantic 

26,260,750 

21.4 

East North Central 

25,297,185 

20.6 

West North Central 

13,296,915 

10.8 

South Atlantic 

15,793,589 

12.9 

East South Central 

9,887,214 

8.1 

West South Central. 

12,176,830 

9.9 

Mountam 

3,701,789 

3.0 

Pacific 

8,194,433 

6.7 
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This method is not often used, but it is serviceable in that the figures 
are made to stand out from the text as they would not do if worked into 
one or two of the sentences. Incidentally, the figures can be more readily 
compared than if they were in the text. 

Graphic presentation. Graphic devices are extremely useful and 
effective for quickly presenting a limited amount of information. The 
following chapters deal with curves, bar charts, maps, and other statistical 
diagrams. 

Leading Considerations 

Types of tables. From the point of view of usage there are two types 
of tables. In the first place there are general or reference tables, which 
are used as a repository of information. These are frequently very ex- 
tensive, covering many pages, as, for example, Table 33 in Population 
Volume II of the 1930 Census, which takes up 18 pages. Such tables 
give detailed information arranged for ready reference. No attempt is 
made to arrange the entries in a general table so that emphasis will be 
placed on certain items, nor is there usually any reason for arranging col- 
umns and rows in order to bring out comparisons desired by the investi- 
gator, The primary, and usually sole, purpose of a reference table is to 
present the data in such a manner that individual items may be found 
readily by a reader. Reference or general tables are often placed in an 
appendix of a published report.^ 

In the second place there are summary or text tables, which are usually 
relatively small in size and which are designed to set forth one finding or a 
few closely related findings as effectively as possible. While the reference 
table may be rather complicated with subheadings and sub-subheadings 
in stub and caption, the summary table should be relatively simple in 
construction. It frequently accompanies a text discussion and hence is 
also referred to as a text table. If a reader is expected to divert his atten- 
tion from a running discourse to a table, it is essential that it be not too 
formidable, but rather simple and easy to understand. Too many readers 
have a tendency to skip all the tables in a report, and this tendency can be 
combatted successfully only by makiag tables appear so simple as to be 
innocuous and by introducing simple and attractive graphs. Because of 
the purpose which a summary table is to serve, the items shown therein 
will be arranged to place emphasis where desired and the columns and 
rows will be so placed as to emphasize the comparisons of paramount im« 
portance. 

2 See, for example, Helen Herrmann, Ten Years of Work Experience of Philadelphia 
Machinists, Works Progress Administration, National Research Project and Industrial 
Research Department of the University of Pennsylvania, Philadelphia, 1938. 
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A summary table is almost invariably the result of boiling down informa- 
tion contained in one or more reference tables, although upon occasion a 
summary table may be based, in whole or in part, upon one or more other 
summary tables. Still more rarely a summary table may be constructed 
directly from data contained in schedule forms. The methods which can 
be used in deriving one table from one or more others are : 

1. Data which are not important for the problem m hand may be 
omitted. Thus, although there are twenty-five states which produce 
bituminous coal, it might suffice to show separate data for only the leading 
ten or fifteen states. 

2. Detailed data may be combined into groups. Thus data shown by 
states may be grouped into geographical divisions. Again, data shown 
by individual industries may be combined into broader industrial groups. 

For example, the manufacture of brick, tile, and terra cotta products; 
of cement, glass, and pottery; and the quarrying of marble, granite, slate, 
and like products may be combined into the major category ^^clay, stone, 
and glass products.” 

3. The arrangement of data may be altered. Thus an alphabetical 
arrangement of cities may be replaced by an arrangement according to 
size of municipality. 

4. Averages, ratios, percentages, or other computed measures may be 
substituted for, or given in addition to, the original absolute figures. A 
column of ratios is shown in Table 8. It will be observed that these 
figures facilitate the interpretation of the data upon which they are 
based. 

Comparisons. While the arrangement into columns and rows facilitates 
comparison of the data, such treatment does not automatically focus atten- 
tion upon the comparisons that are important. This may be effected by 
placing the figures to be compared in contiguous columns or rows. Thus 
it may be seen that Table 6 facilitates the comparison of any one of the 
three items (number of returns, net income, or tax liability) in 1934 wtth 
that same item in 1935; whereas Table 6 makes it easy to compare number 
of returns, net income, and tax liability with each other for either 1934 or 
1935. Either table enables us to compare one income class with another 
for a given year; however, such a comparison for two (or more) years is 
made more readily when the arrangement of columns is as given in Table 5< 
Each of these tables is well constructed, but each focuses attention upon 
a different comparison. One of the most important considerations in 
table construction is that figures which are to be compared must be placed 
in immediate juxtaposition. It should be remembered that two or more 
series of figures are more easily compared when placed in adjacent columns 
than when placed in adjacent; rows, and that figures of a series are more 
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easily compared with each other when arranged in a column than when 
placed in a row. 

Comparisons may be greatly facilitated by the use of ratios, percentages, 
averages, or other computed relationships. Ratios are shown in Tables 8 
and 9; percentages, which are really a form of ratio (see Chapter VII), 
are included in Tables 3, 4, and 7. Ratios and percentages are particu- 

TABLE 7 


Population and Area of Continental United States and Outlying Territories 

AND Possessions, 1930 


Region 

Population 

Gross area in 
square miles 

! Number 

Per cent 

1 of total 

1 . 

Total 

137,008,435 

100.00 

3,738,395 

Continental United States 

122,775,046 

89.61 

3,026,789 

Outlying territories and possessions 

14,233,389 

10 39 

711,606 

Philippine Islands . . 

12,082,366# 

8.82 

114,400 

Porto Rico . 

1,543,913 

1.13 

3,435 

Hawaii Territory* 

368,336 

.27 

6,407 

Alaska ... 

59,278 

.04 

586,400 

Panama Canal Zone 

39,467 

.03 

549 

Virgin Islands of the Umted States . 

22,012 

.02 

133 

Guam 

18,509t 

01 

206 

American Samoa t 

10,056 

01 

76 

Military and naval, etc., services abroad 

89,453 

07 



* Includes Midway Islands 
t Includes Swain Island 

jf Estimated population July 1, 1*^29 (Tb’rtcenth ^"nur! Hcnort of the Director of Education) 
t Includes 1,H8 persona on Na’ ' « t'l d oi' n ^ ^ stationed at Guam. 

Source" riftccntk Ccr^i^s of the U' cd ..o, Icdy, Volume I, p 5 

larly useful when the absolute figures to be compared are large. Note 
in Tables 7 and 8 that rather large population figures can be compared 
readily by the use of percentages and ratios. The use of averages is shown 
in Table 10; these averages would be particularly useful when compared 
with similar figures for other years.^ When tables show monthly fluctua- 
tions and both maxima and minima are noted, as in Table 10, the additionaJ 
entry “Per cent variation of minimum from maximum^^ is useful to show 
the shrinkage from the high point to the low point during the year. 

Emphasis. The proper placing of an item in a table enables it to be 
given suitable emphasis. Since occidentals read from left to right and 
from top to bottom, it follows that the most prominent position in the stub 


* for example- MonMv Labor Uepiew, Jamiary 1936 p. 49. 
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is at the top, and in the caption the most prominent position is at the left; 
likewise, the position of least prominence is at the bottom of the stub and 
at the right of the caption. Notice that, by following this principle in 
Table 3, exports were emphasized at the expense of imports, and 1937 was 
placed in a more prominent position than 1936. 

TABLE 8 

Population op the United States, by Sex and by Race and Nativity, 1930 


Race and nativity 

Total 
both sexes 

Male 

1 Female 

Males 

1 per lOO 
females 

Native white 

95,497,800 

48,010,145 

47,487,655 

101 1 

Foreign-born white 

13,366,407 

7,153,709 

1 6,212,698 

1151 

Negro 

11,891,143 

5,855,669 

6,035,474 

97.0 

Mexican 

1,422,533 

758,674 

1 663,859 

114.3 

Indian 

332,397 

170,350 

j 162,047 

105.1 

Japanese 

138,834 

81,771 

57,063 

143.3 

Chinese . . 

74,954 

59,802 

1 15,152 

394 7 

Filipino 

45,208 

42,268 

2,940 

1437 7 

Hindu 

3,130 

2,860 

270 

1059.3 

Korean 

1,860 

1,223 

637 

192 0 

All other 

j 

780 

609 

171 

356.1 

Total . * * ' j 

122,775,046 

62,137,080 

60,637,966 

102.5 

1 


Source Fzfteenth Census of the Umted States, 1930, Population Volume II, p 103 


Totals are generally placed in either the most prominent or the least 
prominent position, depending upon whether or not it is desired to give 
emphasis to them. When ^^totaF' is shown at the top in the stub, a line 
should be placed below the first row of figures, as in Table 7. If the total 
entry is at the bottom of the stub, the figures are set off by a line drawn 
above them, as in Table 8. An alternate procedure consists of using a 
space instead of a line to set off the totals. Whatever its position, the 
word ''total” in the stub should be indented if possible. 

Individual figures, or columns or rows of figures, may also be emphasized 
by the use of boldface type, as in Table 9. When monthly fluctuations 
of employment, sales, or other factors are shown, the maximum figure 
may be set in boldface and the minimum may be put in italic type, as in 
Table 10, In general, italic is used to indicate an exception rather than 
for emphasis; thus italic type may be used for figures which are not to be 
included in taking a total.^ 

Arrangement of items in stub and caption. Considering the basic 
nature of statistical data which may be encountered, we have noted (p. 3) 


^ See Population Volume I, Fifteenth Census of the United States, 19$0, p. 69. 
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that data may refer to geographical, chronological, qualitative, or quanti- 
tative classifications. We are now interested in the methods which may 
be employed in arranging the items in the stub or the caption (box head) 
of a table. The method of arrangement will be determined partly by the 
nature of the data (whether basically geographical, chronological, quali- 

TABLE 9 

Net Earnings and Changes in Total Capital Account of Insured Commercial 

Banks^ during 1934 



Amount 

(m 

millions 

of 

dollars) 

Amount per $100 of 

Earnings or change 

Total 

available 

funds^ 

Total 

deposits 

Total 

capital 

account 

The banks^ net earnings from current 
operations were . 

445 

$1.03 i 

$125 

$ 7.27 

Recoveries on assets written off and 
profits on securities sold were . 

290 

.es 

.82 

4.73 

Ret earnings and recoveries were 

73S 

1.71 

2.07 

12.00 

The banks paid interest on capital 
notes and debentures, and dividends 
on preferred and common stock of . 

185 

.43 

52 

3.02 

There remained after payment of 
interest and dividends 

550 

1.28 

1.55 

8.98 

The banks wrote off losses of . . . 

1,080 

2 52 

3 04 

17 64 

The resulting reduction in capital 
account was . 

530 

1.24 

1.49 

8.66 

New capital funds were paid in to the 
net amount of 

610 

1.42 

1.72 

9.97 

The net increase in the total cap- 
ital account was® 

80 

,18 

.23 

1.31 


1 14,124 banks; figures for 11 State banks in the District of Columbia, 2 insured national banks in 
Alaska and 9 other insured banks are not included Figures for national banks for second half of 1934 
are estimated- 

2 Estimated average amount during year of total assets less customers’ liability on account of accept- 
ances, acceptances of other banks and bills sold with endorsement, and securities borrowed 

3 Exclusive of changes ivsiilting from 1 and closing of banks. 

Source* Annual Report rf the FetU'-al Denoi>.t Insurance Corporation for the Year Ending December Sit 
19SJ(.t p 55. 

tative, or quantitative), and partly by a consideration of whether the data 
are to appear in a reference table or in a summary table. A number of 
different methods of arrangement may be employed. 

Alphabetical. This method of arrangement is admirably adapted for 
use in a general table, because it enables individual items to be located 
with ease. It is, obviously, not a useful method for text tables. It can 
be used only with series which are classified geographically or qualitatively. 

Geographical. The geographical method of arrangement may be em- 
ployed for series classified geographically, but it is applicable only when an 
established usage has been set up and should be used only when the statis- 



Number op Wage Earners op Bo-m Sexes Reported Employed ik Ohio Establishments on the 15th op Each Month, by 

Industry Group, 1937 

(BLeporta are reQuired from all mines and quarries and from all other concerns employing three or more persons Both full time and part time employees are included ) 


Transporta- 
tion and 
public 
utilities* 

O lO 

C) 

lO 

po" oT cT 

^ lo iO 

iO 00 
(M O 
lO 00 CO 

CC cO CD 

63,388 

63,011 

63,484 

63,055 

61,174 

60,092 

61,697 

78 

Trade, 
retail and 
wholesale 

65,950 

66,711 

68,504 

69,462 

69,857 

71,156 

71,140 

70,604 

72,148 

73,316 

70,624 

72,482 

70,163 

10 0 

Service 

119,489 

120,665 

122,332 

126,805 

129,352 

131,367 

129,166 

128,685 

131,722 

128,621 

125,968 

123,419 

126,466 

CO 

05 

Mining 

and 

Quarrying 

33,062 

33,287 

33,629 

30,483 

30,677 

30,517 

30,287 

30,066 

32,294 

34,288 

34,093 

34,041 

32,227 

12 3 

Manufac- 

tures 

678,023 

705,642 

719,137 

717,690 

738,546 

702,103 

726,013 

731,714 

735,868 

odco cT 

jH CO ^ 
1^ CO <50 

704,782 

16 0 

Construc- 

tion 

35,089 

36,432 

38,673 

47,269 

53,260 

56,768 

60,014 

62,218 

61,242 

57,582 

60,733 

36,835 

49,676 

43 6 

Agriculture 

05 

^ CO oo 
Ic^io CO 
cxT 

10,147 

10,728 

12,475 

(M CO 

00 CO 

co'c^Too 

r-i rH t-4 

13,641 

10,290 

8,331 

10,662 

00 

All 

industries 

997,641 

1,029,826 

1,051,082 

1,063,403 

1,095,245 

1,067,684 

1,093,490 

1,098,664 

1,109,800 

1^ 

odco txo 

00 t- 4 Mo 
O 

tH 

s 

CO 

tH 

05 

CO 

rH 



5d 


* Not including mterstate transportettion. 

Source: Division of I»abor Statistics, Department of Industrial Relations of Ohio. 
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tician is sure that his readers are familiar with the classification. In 
Table 4 the various geographic divisions of the United States are listed in 
their customary order. The various states in each division and their 
order of listing may he seen in Population Volume I, page 10, of the 
Fifteenth Census of the United States, 19S0, Although the Census makes 
frequent use of the geographical method of arrangement for the states, it 
almost invariably lists the counties of a state alphabetically. For ease 
of reference, in a general table, the geographical arrangement is hardly 
so satisfactory as the alphabetical. While it may be argued that the geo- 
graphical arrangement often places together contiguous, and therefore 
comparable, areas, it must be obvious that the geographical arrangement 
does not always do so. It is not usually a good method of arrangement for 
a summary table, since this arrangement does not place important items 
in prominent positions. 

Magnitude. A very satisfactory method of arranging items in a sum- 
mary table consists of listing them according to size, usually with the 
largest item first, but sometimes with the order reversed. The outl3dng 
territories and possessions shown in the stub of Table 7 are given in order 
of magnitude. When the largest item is placed first, the most important 
items (numerically) are placed in the most prominent positions. Arrange- 
ment of items according to size is not useful in a general table because it 
does not facilitate the finding of individual items as does the alphabetical 
arrangement. Data classified geographically or qualitatively may be 
arranged according to magnitude. So also may data classified chrono- 
logically, but they lose their chronological sequence when arranged by 
magnitude. 

Historical. Data classified on a chronological basis would generally 
be arranged chronologically or historically. When years are listed, either 
the most recent or the earliest date may be shown first. The months, 
however, are usually listed with January first. When the historical ar- 
rangement is called for, it is adapted to either general or text tables. The 
historical arrangement is used in the stub of various tables in Chapter XV, 

Customary. Certain data that are basically qualitative are generally 
arranged according to customary classes. The exports and imports of 
Table 3 are grouped into five categories: crude materials, crude foodstuffs, 
manufactured foodstuffs, semi-manufactures, and finished manufactures. 
In the stub of Table 8 the population of the United States is divided into 
ten customary groups upon a race and nativity basis, which are generally 
listed in the order shown, while in two of the box headings of this table the 
sexes are listed as male and female and are invariably given in this order. 
It will be noticed that following the listing of ten groups of the population 
there is given a category "'all other/' Such a group is usually placed at 
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the bottom in the stub, or at the right in the caption. Good statistical 
practice dictates that an ^^all other/’ ^^miscellaneous/’ or ^^not reported” 
group should include relatively small numbers; otherwise the adequacy of 
the classification or the accuracy of the collection of the data may be ques- 
tioned. Arrangement by customary classes is appropriate for either a 
text or a reference table. QuantitaUve data may be arranged into classes as 
shown in the stub of Table 5. Other arrangements of this type are shown 
in Chapter VIII. Distributions of this sort generally begin with the class 
of smallest numerical value, as in Table 5. Such an arrangement may be 
used in either a text or a reference table. 

Progressive. This method of arrangement is illustrated in the stub of 
Table 9. Notice that the items are listed in such a way that the final 
figure develops logically from those given before. The table, as shown in 
the original report, contained no stub heading. If items are related to 
such a degree that they can be placed together in a stub, it should ordinarily 
be possible to write a stub heading. The student should consider what 
other headings might be appropriate for the stub of this table. Another 
example of the progressive arrangement is given in the Monthly Labor Re- 
view for February 1939, page 357. Monthly data are shown for the number 
of strikes, and the progressive headings in the caption are: 


Con- 


tinned 

Begm- 

from 

nmg 

preced- 

m 

ing 

month 

month 



In 


In 

prog- 

Ended 

effect 

ress 

in 

at end 

during 

month 

of 

month 


month 


The progressive arrangement is suitable for either text or reference tables. 

Numerical. The wards of cities are usually designated as Ward 1, 
Ward 2, etc. When data for such subdivisions are shown, a numerical 
arrangement is generally followed. The precincts and districts of counties 
are sometimes numbered; the departments of a factory and salesmen's 
territories or sales areas may also be identified by numerical designations. 
This method may appear in either a text or a reference table. The num- 
bers assigned to the categories are frequently only labels serving to identify 
some underl 3 dng arrangement. For example, in a shoe factory, Depart- 
ment 1 was the cutting department; Department 2, the fitting department; 
Department 3, the lasting department, etc. 

In using the various methods of arrangement, remember that the items 
should be arranged for greatest ease of reference in a reference table, 
whereas in a text table the arrangement should be designed to emphasize 
the important items and to stress the proper comparisons. 
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Details of Table Construction 

Title and identification. A title should accompany every table and is 
customarily placed above the table. The title should be clearly worded 
and should state briefly what data are shown in the table. A title should 
be so worded as to mention the more important considerations first, placing 
toward the end any reference as to how the items are arranged and what 
period of time is covered. In general the title states, in order: what, 
where, how classified, and when. Illustrations of titles are shown in the 
various tables of this chapter. It will be noted that, when a title necessi- 
tates the use of several lines, an inverted pyramid arrangement is used. 

If a title is undizly long, it may be advantageous to place a '^catch title'' 
above the main title or even to substitute the catch title for the full title. 
This shorter title undertakes merely to state the general nature of the data 
in the table. For Table 3 a catch title might read ^Tokeign Trade, 1936 
AND 1937." 

When more than one table is included in a study, it is desirable to num- 
ber the tables consecutively in order that each one may be identified by 
number rather than by title. 

Prefatory note and footnotes. A prefatory note, one or more footnotes, 
and a source note may be appended to a table. A prefatory note is placed 
Just below the title and in smaller or less prominent type. The prefatory 
note provides an explanation concerning the entire table or a substantial 
part of it, as in Table 10. 

Explanations concerning individual figures, or a column or row of figures^ 
should be given in footnotes. Footnotes keyed to stub entries and 
column headings may be referred to by means of numbers, as in Table 9 ; 
however, footnotes keyed to figures should be identified by a symbol 
(*) ty $j etc.), as in Table 7, or by a letter, but preferably not by a 
number. 

Source notes. As previously indicated, the source note may appear 
below the title or below the footnotes. The latter practice has been gen- 
erally followed in this text. The data set forth in a table will not often 
be material which the investigator has collected. Usually the figures wili 
have been taken from one or more published or unpublished sources. The 
source note should be complete, giving author, title, volume, page, pub- 
lisher and date- Not only is it courteous to mention the source of data 
quoted, but such information gives the reader some idea of the reliability 
of the data and makes it possible for him to refer to the original source to 
verify quoted figures or to obtain additional information. Although 
stating the source relieves the quoter of responsibility for shortcomings 
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in the original study, the practice does not relieve him of responsibility 
for knowingly or carelessly quoting unreliable data. 

Sometimes data are taken from a secondary source instead of a primary 
source, because the secondary source may be more convenient. In such 
a case it may be advisable to mention both sources; for example, ^ ^Source: 
Automobile Manufacturers Association as quoted in Statistical Abstract 
of the United States, 1937, p. 363/^ 

Data for a table may sometimes be taken from two or more different 
sources. When this is done, care must be exercised to see that the data 
are comparable. The importance of comparability of data was discussed 
in Chapter II; it is not necessary to say more on that topic at this point. 

When apparent mistakes are found in a source, it is well to call atten- 
tion to such difficulties. The December 1935 Monthly Labor Review (p. 
1503) reprints a table from The Oriental Economist showing that total 
payrolls in 10 industries m Japan m 1933 were 647,340,199 yen, but points 
out in a footnote that, if the figures given for each of the 10 industries are 
added, the result is 647,430,199 yen. 

Percentages. When percentages are used in a table, the stub or the 
caption entry should indicate clearly to what figures the percentages 
relate. Thus the term “per cent^^ alone should be avoided; rather say 
“per cent of total, “per cent of increase or decrease, etc. Sometimes 
tables are divided into a “number^^ section (showing amounts) and a “per 
cent^' section, as m Table 3, which shows a “value^^ section and a “per 
cent^^ section. This table and Table 21 illustrate the use of adequate 
headings referring to percentages. 

The percentages for the 1937 “exports^^ section of Table 3 total 100.1, 
while those for the 1936 “exports” section total 99.9. When individual 
percentages are written correct to tenths of one per cent, as is customary, 
the total will occasionally be slightly over or below 100.0 because of the 
accumulation of positive or negative remainders when rounding. If the 
percentages had been entered in hundredths or thousandths of a per cent, 
the total would have been closer to 100.0. Although a “per cent of total” 
column may add to slightly more or less than 100.0, the total is shown as 
100.0, since that is what the individual percentages would yield if carried 
out far enough. If a total adds to less than 99.8 or more than 100.2, it is 
advisable to check the calculations for mistakes. 

Romding numbers. In order to avoid confusion and to facilitate com- 
parisons, numbers of many digits may be rounded. Numbers may also 
be rounded because the compiler feels that they are accurate, not to the 
final digit, but only in terms of (say) thousands or millions. 

Table 3 exhibits roimded numbers, and mention of that fact is made in 
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the caption. When nximbers are rounded, a statement to that effect should 
be made in a prefatory note or in the stub or the caption. The wording 
may be ^‘millions of . . . “000,000 omitted/' and like expressions. 

If a series of figures is to be expressed in thousands of dollars, for 
example, the rounding is to the nearest thousand. Thus $2,648,302 would 
become $2,648 (thousand) and $7,226,782 w'^ould become $7,227 (thousand). 
If the heading “thousands of dollars" appears in the box head (or stub) of 
a table, the dollar mark is not needed (see Table 3). 

No serious error is ordinarily introduced by rounding. If each of a 
series of numbers is roimded, some will be raised and some will be lowered, 
but the errors so introduced tend to offset each other. -Furthermore, it 
may be felt that to show all the digits of a large number is to give the ap- 
pearance of spurious accuracy. For example, the population of the 
United States was ascertained to be 122,775,046 persons in 1930, but the 
Sgure could hardly be accurate to units or even to hundreds. However, 
it may be maintained that the figure 122,775,046 is the one obtained by 
the best methods available and is therefore probably more accurate than 
any rounded figure. Irrespective of the merits of these two points of view, 
six (or fewer) significant figures may often be accurate enough for the 
comparisons desired. 

When computed values, such as totals, percentages, and averages, are 
to be shown in tables of rounded figures, these values should, if possible, 
be calculated from the original figures before rounding. 

Totals. We have previously noted that totals, when of major impor- 
tance, may be placed at the top in the stub and at the left in the caption. 
When it is not desired to emphasize totals, they may be placed at the bot- 
tom in the stub and at the right in the caption. 

Table 8 carries both a total column and a total row. An arrangement 
such as this results in a single number (122,775,046) which is sometimes 
termed a “grand total" or a “checked grand total." The fact that the 
figures yield the same sum when added vertically and horizontally is not a 
positive check since two or more compensating errors may have been 
made. That, however, does not often happen. We do have definite 
proof either that no errors were made or that more than one was made. 

Units. The units of measurement of the figures in a column or a row 
of a table may often be self-explanatory. When this is not true, the 
nature of the unit should be made clear in the stub or the box head, as in 
Table 9. If the explanation applies to all figures in the table, it may appear 
as a prefatory note. Data of monetary units are usually self-descriptive, 
because of the use of the dollar sign. Note, in the last three columns of 
Table 9, that this sign appears for only the first entry in a column. 

Size and shape of table. In genera] a table should be designed so that 
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it will be neither very long and narrow nor very short and wide. A table 
must also be adjusted to the space in which it is to appear. Usually this 
limitation takes the form of a page of a book or a report. Of course, a 
table need not occupy the entire length or width of a page. If the table is 
too large for the allotted space, it may be recast into several smaller 
tables. Reduction of type size may permit a table to be included on a 
page, but reduction should not be made at the expense of legibility. If the 
use of a folded page is not desirable, the table may be arranged to occupy 
two facing pages. Because of the difficulty of aligning pages perfectly in 
binding, the stub should be repeated on the second page. When reference 
tables are continued over several pages, they may be split either vertically 
or horizontally. In either case complete stub and caption entries should 
appear on each page, the title should be repeated on each page, and foot- 
notes may appear at the bottom of the appropriate page or may be ac' 
cumulated at the end of the table. 

The horizontal dimension of a table may be determined by allowing for: 

(1) Width of stub, determined by longest entry. (A very long entry 
may be put on two or more lines to save space, see Table 9.) 

(2) Width of each column, determined by largest number or by entry in 
each box head. (By hyphenating words an entry in a box head may be 
compressed horizontally and expanded vertically.) 

(3) Ruling. 

(4) Margins. 

The vertical dimension may be ascertained by considering : 

(1) Space needed for title, prefatory note, footnotes, and source note. 
Since the first line of the title should not exceed the table in width, a long 
title may require several lines. 

(2) Number of lines needed tor caption or stub heading. 

(3) Number of rows in body of table. 

(4) Ruling. 

(5) Margins. 

Ruling, Most of the tables in this text are shown with single-line ruling 
and are open at the sides. Double-line ruling is sometimes used but, to 
the writers at least, seems to make either hand-ruled or printed tables 
appear somewhat complicated. Tables are rarely closed at the sides, and 
should never appear with one side closed and one open. 

There seems to be a growing tendency to use text tables without ruling, 
either vertical or horizontal. Table 11 shows how Table 8 appears when 
no ruling is used. 

An examination of tables in this book and elsewhere will show that: 
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(1) No horizontal lines are used in the body of a table except to set off 
totals and occasionally to separate a table into distinct parts. 

(2) Horizontal lines separating major and minor box heads do not con- 
tinue into the stub heading. 

TABLE 11 


Population of the United States by Sex and by Race and Nativity, 1930 



Total 
both sexes 



Males 

Race and nativity 

Male 

Female 

per 100 
females 

Native wliite . . . 

95,497,800 

48,010,145 

47,487,655 

101.1 

Foreign-born white . . . 

. . 13,366,407 

7,153,709 

6,212,698 

115.1 

Negro. . . 

11,891,143 

5,855,669 

6,035,474 

97 0 

Mexican 

1,422,533 

758,674 

663,859 

114.3 

Indian 

332,397 

170,350 

162,047 

105.1 

Japanese 

138,834 

81,771 

57,063 

143.3 

Chinese 

74,954 

59,802 

15,152 

394.7 

Fihpino 

45,208 

42,268 

2,940 

1437.7 

Hindu 

3,130 

2,860 

270 

1059.3 

Korean 

1,860 

1,223 

637 

192.0 

All other 

780 

609 

171 

356.1 

Total 

. 122,775,046 

62,137,080 

60,637,966 

102.5 


Source* Fifteenth Census of the United States, 19S0, Population Volume II, p. 103 


(3) All vertical lines separating box heads appear only between the box 
heads which they separate; they do not extend above these box heads. 

Gtiiding the eye. Skipping a line every three, four, or five rows, as in 
Table 10, makes it easier for the eye to follow the rows across a table. The 
use of leaders in the stub of a table is also helpful. 

Zeros. It is not customary to show a zero in a table (other than a 
computation form). When no cases have been found to exist or when the 
value of an item is zero, the fact may be indicated by means of dots (...) 
or short dashes ( — ) . When there is no figure for an entry because informa- 
tion is lacking, a footnote should be used to indicate that fact. 

Size and style of type. Too much variety in size or style of type (or 
lettering) is not desirable- In general the title should be most prominent 
and is usually set in large and small capitals or in bold face type. The 
items listed in the stub and caption and the figures ha the body of the table 
are usually set in the same size type. Footnotes, prefatory note, and 
source note are generally set in smaller type than that used in the body of 
the table. 

Statistical Reports 

When making a statistical report, the method of preparing the tables 
will be dictated partly by the number of copies of the report required and 
partly by the cost involved. Tables may be handwritten, typewritteUv 
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mimeographed, multigraphed, reproduced by a photostatic or photographic 
process from handwritten or typed tables, or printed. 

There is a distinct disadvantage in the use of the ordinary typewriter 
for preparing other than relatively simple tables, because of the lack of 
flexibility of spacing and of size of type. Table 12 shows a table without 
ruling, prepared on an ordinary typewriter with pica type. Table 13 

TABLE IS 

EXPORTS 9F UHITBI) STATES JMCHATOISB AlII) IMPORTS FOR CONSUMPTION, 
BX ECONOMIC CLASSES, 19S6 AND 1937 

(Materials imported for purposes of re-export are not Included#) 


Economic class 

Value 

(thousands of dollars) 

Per cent of 
total value 



1956 


1936 

Exports of United States 

merchandise, total #.##... 

5,294,916 

2,418,969 

100.0 

100.0 

Crude liiaterials •#««##••••##• 

721,871 

668,168 

21# 9 

27.6 

Crude foodstuffs .#•##•#•.,.. 

101,742 

58,144 

3.1 

2.4 

Manufactured foodstuffs ••#•# 

177,451 

143,798 

5.4 

5.9 

Semi-manufactures 

677,254 

394,760 

20.6 

16.5 

Finished manufactures •.•♦•#• 

1,616,598 

1,154,099 

49.1 

47.7 

Imports for consumption. 

total 

3,012,487 

2,425,977 

100.0 

100.0 

Crude materials ##«•.•«*•#««• 

973,535 

762,965 

32.3 

50.2 

Crude foodstuffs 

415,545 

348,682 

13.7 

14.4 

Manufactiired foodstuffs *#.## 

440,105 

386,240 

14.6 

15.9 

Semi-manufactures •*•••».••#• 

634,181 

490,258 

21.1 

80.8 

Finished manufactures 

561,325 

465,852 

18.3 

19.2 


Source; United States Bureau of Foreign and Domestic Commerce# 
Monthly Summary of Foreign Commerce > December 1937, p. 36. 

presents the same data and indicates how ruling may be done on a type- 
writer. Note that more flexibility was obtained by using two typewriters, 
one with pica and one with elite type. By using elite type for the stub 
entries and the body, a certain amount of space may be saved. Somewhat 
more flexibility in planning a table may be had by using a typewriter with 
variable spacing and with different kinds and sizes of type. 

If only a few copies of a report are to be made and if the tables are 
simple, the tables and accompanying text may be typed and carbon copies 
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made. If several dozen copies are needed, the longhand or typed material 
may be photostated at a cost of about 25 cents per 8| X 11 inch page. By 
this method, reduction or enlarging is possible and copies may be had 
rather promptly since no plate need be made. If a larger number of 
copies is required, resort may be had to mimeographing or multigraphing. 

TABLE 13 

EXPORTS OF UNITED STATES MERCHANDISE AND IMPORTS FOR CONSUMPTION, 

BY ECONOMIC CLASSES, 1936 AND 1937 


(lAterials imported for purposes of re-export are not included*) 


Economic class 

: Value 

: (thousands of dollars) 

Per cent of 
total value 


! r 

1 1937 1 


1937 I 1936 

Exports of United States merchandise. 

: : 
total 1 3,294,916 ! 

2,418,969 

100*0 1 100.0 

Crude materials 


668,168 

58 1-14 

21*9 I 27.6 
3*1 s 2*4 

Manufactured foodstuffs 


143,798 

5.4 : 5.9 

Seini-marLuf ftctura s 

677,254 • 

394,760 

1,154,099 

20.6 1 16.3 
49.1 1 47.7 

Finished manufactures 


Imports for consumption, total*. *.*.. 

: 1 

t- ' ' i 

2,423,977 

100*0 1 100.0 

Crude materials 

riwirlfl f*n .s+nTf'f’ c * 

; ' : 

732,965 

748 6R2 

32.3 1 30.2 

! 14.4 

e+ii'P'f’.j 


'TOfi 040 

JbO* r ^ 

14 6* 1 5-9 

Semi— manufacture s 

634,lfil j 

ooo 

490,238 

465,852 

21.1 1 20.2 
18.3 1 19.2 

1 

Finished manufactures 

: s 

1 ' 1 


Source: United States Bureau of Foreign and Domestic Commerce, Monthly Summary 
of Foreign Coimneroe, December 1937, p* 36* 


Tables may also be reproduced by a photo-offset process, which is quite 
satisfactory and is often cheaper than printing because it avoids the type- 
setting. Enlarging or reduction is possible; typed material may be 
reduced so that 4 ordinary 8| X 11 inch pages (pica type) will appear on 
one page. It should be noted that the typed copy should be a first-classs 
job if satisfactory reproductions are to be obtained. 

Occasionally the gelatin-pan method may be useful when only a few 
copies are needed. A special ink is available for handwritten material and 
for illustrations; also, ribbons and carbon paper may be obtained for typed 
material. This method is hardly so satisfactory as those above men- 
tioned, but it enables a few copies to be made by anyone with rather inex- 
pensive equipment. The method does not enlarge or reduce. 
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GRAPHIC PRESENTATION 

SIMPLE CURVES 


The Graphic Method 

Attention has already been given to the presentation of statistical data 
by means of text, tabular, and semi-tabular devices. Ordinarily statistical 
data will be presented either in the form of a table or a chart. This chapter 
and the following two are devoted to a discussion of the graphic devices 
by means of which statistical data may be set forth. As will be readily 
seen from a perusal of the pages of this book, charts or graphs are more 
effective in attracting attention than are any of the other devices. Readers 
are therefore not so likely to skip a chart as a table. A simple, attractive, 
well-constructed graph, showing a limited set of facts is easier to under- 
stand than is a table. 

The outstanding effectiveness of a chart as a device for presenting a 
limited amount of data, makes it a most useful statistical tool. Certain 
limitations should be noted, however. In the first place, charts cannot 
show so many sets of facts as may be shown in a table. Numerous 
columns and rows may appear in a table; but imagine Chart 3 (or any 
other in this chapter) with six or eight criss-crossing and intertwining 
lines, and it is immediately obvious why a chart should show a limited 
amount of information at once. In the second place, although exact 
values can be given in a table, only approximate values can ordinarily be 
shown by a chart. In a table we may enter as many digits as desired, but 
we can plot only the approximate value on a chart. For example, while 
the data upon which Chart 3 is based could be recorded in a table in terms 
of dollars and cents, no such accuracy is possible in the chart. Thus charts 
are useful for giving a quick picture of the general situation but not of 
details. In the third place, charts require a certain amount of time to 
construct, since each one is an original drawing. This diflSlculty, however. 
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is offset by the added effectiveness which the chart possesses in comparison 
with a table. ^ 


Types of Charts 

In this text we shall discuss: curves or line diagrams; bar charts, involving 
one-dimensional comparisons; area diagrams, involving two-dimensional 
comparisons (including particularly pie diagrams which involve one- or 
two-dimensional comparisons, or comparisons of angles) ; volume diagrams, 
involving a visualization of the third dimension and three-dimensional 
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Chart 2. Axes for Curve Plotting. 


comparisons; and statistical maps. Other specialized types of charts and 
certain charts which are graphic but not statistical (for example, organi- 
zation and procedure charts) are not treated here, but are discussed in 
some of the special references on graphic methods listed at the end of 
this and the following chapters. 

^ William Playfair, who is understood to have invented outright” the graphic method 
in the latter part of the 18th century says: ‘^The advantage proposed by this method, 
is not that of giving a more accurate statement than by figures, but it is to give a more 
simple and permanent idea of the gradual progress and comparative amounts, at differ- 
ent periods, by presenting to the eye a figure [chart], the proportions of which correspond 
with the amount of the sums intended to be expressed.” See the article ^‘Playfah and 
His Charts,” by H. Gray Funkhauser and Helen M. Walker, in lElcontmit History^ 
February 1935. pp. 103-109. 
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Plotting a Curve 

When statistical data are shown as curves, the points are plotted in 
reference to a pair of intersecting lines, called axes and shown in Chart 2. 
The horizontal line is known as the ^^X-axis,^’ and the vertical line is 
designated as the ^'Y-axis.” Positive values are shown to the right of 
zero on the X-axis and above the zero on the F-axis; negative values 
are placed to the left of zero on the X-axis and below the zero on the F- 
axis. The point at which the two axes intersect is zero for both X and F 
and is referred to as the ^'zero point, the “point of origin,’^ or merely the 
“origin.’^ The positive and negative values on the axes increase as we 
move away from this origin. 

MILLIONS 
OF DOLLARS 



Chart 3. Individual Federal Income Taxes in the United States, 1913-1936. The 
reader may wish to add the later data to this chart as they become available. (Data 
from Stat%st%cal Abstract of the United States, 1937, p. 117, and by correspondence.) 

When plotting two variables on the axes, one variable is assigned to one 
axis and the other to the other axis. It is customary to put the values of 
the independent variable on the X-axis and the values of the dependent 
variable on the F-axis. In Chart 3, for example, it is apparent that time 
is the independent variable ; therefore time appears on the X-axis. Income 
tax payments, on the other hand, constitute the dependent variable; hence 
they are shown on the F-axis. In Chart 4 the independent variable is 
the amount of income received by dentists, while the dependent variable 
is the proportion of dentists receiving various specified incomes. In some 
cases two variables may be mutually dependent on each other, or they may 
both be dependent upon some other factor. Under such conditions one 
variable is more or less arbitrarily placed on one a.xis and the other variable 
on the other axis. 
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The two axes divide the plotting area into four sections known as 
^^quadrants.^^ For reference purposes these quadrants are designated 
I, II, III, and IV, as shown in Chart 2. Quadrant I accommodates values 
which are positive on both the X- and 7-axes. Quadrant II provides 
for values which are negative on the X-axis and positive on the 7-axis. 
Quadrant III takes care of values which are negative on both axes. Quad- 
rant IV is for values which are positive on the X-axis and negative on the 
7-axis. 

Any point plotted in one of the quadrants may be located by referring 
to its abscissa, which is its horizontal or X distance from zero, and to its 
ordinate, which is its vertical or 7 distance from zero. For illustrative 

PtR CENT 
OF DENTISTS 



ANNUAL INCOME IN THOUSANDS OF DOLLARS 


Chart 4. Annual Net Incomes of Dentists Engaged in General Practice, 1929. 
(Data from Maurice Leven, The Incomes of Physicians, p. 131. The University of 
Chicago Press, Chicago, 1932.) 

purposes four points have been plotted on Chart 2, one in each quadrant. 
Pi represents X = +4, 7 *= +2. P 2 indicates X — 3, 7 = +3. 
Ps is X - -4, 7 = -3. P 4 shows X = +3, 7 - -2. 

When the axes are used as bases of reference for plotting equations, any 
or all of the quadrants may be used, since many equations may call for 
negative values of X or of 7, or of both. At present, however, we are not 
interested in the graphic representation of equations (see pp. 395*397 
and 426-432), but in graphically portraying observed statistical data. 
When we are dealing with statistical data, it must be obvious that both 
the X and 7 variables are ordinarily positive quantities, and that therefore 
we shall generally use only the quadrant designated as I. Chart 3, show- 
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ing the individual income tax payments in the United States over a period 
of years, is an example of a curve lying whoUy in quadrant I. 

Quadrants II and IV are occasionally used in conjunction with quadrant 
I. Chart 4 shows a curve which makes use of quadrants I and II; the 
curve of Chart 5 lies partly in quadrant I and partly in quadrant IV. 
Since both X and Y values are negative in quadrant III, that quadrant 
is very rarely used. 


Types of Data Shown by Curves 

It was noted earlier that statistical data are basically classified according 
to chronological, geographical, quantitative, or qualitative characteristics. 

NET IMMIGRATION 
IN THOUSANDS 



Chart 5, Net Immigration (Immigration Minns Emigration) into the United States? 

1910-1935, Years Ending June 30. (Based on data of Chart 23.) 

Curves are frequently used for picturing time series and for showing fre- 
quency distributions (by far the most important sort of quantitatively 
classified data), although, of course, other types of graphs are also applic- 
able as shown in the following chapters. Qualitatively and, especially, 
geographically classified data are rarely depicted by curves; instead, bar 
charts and other devices are used, as will be indicated hereafter. 

Time series curves. The method of plotting time series depends upon 
the type of data to be represented. We may distinguish between period 
data and point data. Period data, such as total sales per month, average 
monthly sales per year, and average prices during the year, refer to a 
period of time. Point data are those, such as inventory values, price 
quotations, or temperature readings, which refer to a particular point of 
time. 
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Charts 3 and 5, which represent period data, have dates along their 
horizontal scales, as is customary with all time series charts (whether they 
are of period data or point data). When annual data of this type are 
plotted, the dates on the horizontal scales may be placed below the vertical 
lines, as in Chart 3, or below spaces as in Chart 5 and as in the left-hand 
part of Chart 29. Either method may be used; one argument for labeling 
the spaces is that this gives a visual impression of time as having duration. 
When monthly (and daily, weekly, or quarterly) data are plotted over a 
period of years, there is no choice but to label the spaces representing each 
year since, if the lines were labeled, it would not be immediately obvious 


DOLLARS 



Cliart 6. Average Weekly Earnings of Employees in New York State Factories, 
1934-1938. (New York Department of Labor, The Industrial Bulletin, January 1939, 
P. 23) 

to aU readers whether the label referred to the space preceding the label, 
the space following the label, or possibly haK of the space on each side- 
Each horizontal year space is divided into 12 parts for the plotting of the 
monthly figures, and these figures are plotted at the middle of each of the 
12 spaces. Chart 6 is an illustration of this procedure. 

When point data are used, label the spaces and plot the observation 
within that space at the point of time to which the data refer. Thus, in 
plotting monthly data, we should plot the figures at the beginning of each 
space representing the month for beginning-of-the-month data, at the 
middle of the space for middle-of-the-month data, and at the end of the 
space for end-of-the-month data. If this scheme is followed, the results 
are as shown in Charts 7 and 8; Chart 7 shows first-of-the-naonth data and 
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MILLIONS 
OF POUNDS 



Chart 7. Cold Storage Holdings of Cheese as of the First of Each Month, 1933-1937 
(Data from Agricultural Stahshcsy 1938, p. 363.) 

Chart 8 shows end-of-the-month data. Point data as of the middle of 
the month, such as employment data which are often collected as of the 
payroll period nearest the fifteenth of each month, would be charted in 
similar fashion to the data in Chart 6, which shows period data. 

DOLLARS 



Chart 8. Money in Circulation per Capita in the United States at the End of Each 
Month, 1934-1938. (Data from Standard Statistics Company, Basic Statistics, p, A-22 
and monthly supplements.) 
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Curves of frequency distributions. The curve of Chart 4 is a graphic 
representation of a frequency distribution. Frequency distributions will 
not usually continue into the second quadrant as does this one. In this 
instance, however, there were some negative incomes. 

Table 14 shows a frequency distribution^ of the grades of the 1937 grad- 
uating class of the United States Naval Academy. In order to show the 

TABLE 14 

Frequency Distribution of 
Grades op the 1937 Graduating 
Class of the United States 
Naval Academy 


Grade 

Number of 
midshipmen 

68.0-69.9 

4 

70.0-71.9 

17 

72.0-73 9 

39 

74.0-75.9 

62 

76.0-77.9 

58 

78 0-79.9 1 

52 

80.0-81.9 

35 

82.0-83.9 

22 

84.0-85.9 

18 

86.0-87.9 i 

13 

88.0-89.9 ; 

4 

90.0-91.9 

2 

92.0-93.9 : 

1 

Total 

327 


Source. Table 25 


genesis of the frequency distribution curve, we shall plot the data first 
as a series of rectangles or bars as in the ^^column diagram’^ of Chart 9. 
It will be noticed that the grades have been placed along the horizontal 
axis and the frequencies (number of midshipmen; along the vertical axis. 
There are as many columns in the chart as there were classes in the table, 
and the height of each column represents the frequencies in the corre- 
sponding class. This column diagram is transformed into a curve by 
connecting the midpoint of the top of each rectangle with the adjacent 
one, as shown by the dotted line in Chart 9. This is done upon the as- 
sumption that the frequencies in a class interval are evenly distributed 
throughout the class. The mid-value of a class is consequently taken as 
representing the class.^ It will be observed that the dotted line cuts off 


2 Frequency distributions are discussed in Chapter VIII. 

3 This point is discussed at greater length in Chapter IX. 
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NUMBEI^ OP 
MIDSHIPMEN 



Chart 9. Grades of the 1937 Gradtiating Class of the United States Naval Academy, 
Shown by a Column Diagram and by a Frequency Curve. (Data from Table 14.) 

some small triangular pieces of the original rectangles and that it also 
includes some small triangles not formerly included, but it is obvious that 
triangle A = triangle A', triangle B = triangle R', etc. Sometimes the 
curve is continued at each end to join the X-axis (indicating a frequency 
of zero) at the mid-value of the next possible class. This procedure re-* 
suits in having the same area under the curve as is included in the rect- 

NUMBEfl OP 
MIDSHIPMEN 



Chart 10. Grades of the 1937 Graduating Class of the United States Naval Academyo 

(Data of Table 14.) 
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angles. However, the result may sometimes be a curve which extends 
beyond zero on the X-axis and this is apt to be meaningless. The fre- 
quency distribution may be shown either as a column diagram or as a 
frequency curve (frequency polygon). The latter is more usual and the 
curve is plotted directly as in Chart 10, without the intermediate step 
constructing columns. 

In Table 15 the families in the United States have been classified accord- 
ing to the number of members in each. The data of this table could be 

TABLE 15 


Families in the United States, 
BY Size, 1930 


Number of 
members 

Per cent of 
all families 

1 

7.9 

2 

23 4 

3 

20 8 

4 

17.5 

5 

12 0 

6 

7.6 

7 

4.7 

8 

2.8 

9 

1.6 

10 

.9 

11 

.6 

12 or more 

.4 

Total . . 

100 0 


Source. StahsUcal Abstract of the Umted 
States, 19S7, p 50 T , •■,*--,,11 

familic- '■-'s 1 • , 1 - 

elude . r - , j- 

lated . . ‘ or adoption, and 

groups snaring une same living accommoda- 
tions as “partners ” 

plotted as a curve, similar in general plan to that of Chart 10. However, 
the classification into families of 1, 2, 3, 4, etc., members possesses a lack 
of continuity between the various classes, and the data are more properly 
shown by means of a bar chart similar in construction to that of Chart 124 
(p. 291). The separation of the bars as in the latter chart emphasizes the 
lack of continuity between the categories. 


Rules for Drawing Curves 

While statisticians have not agreed upon a standard procedure setting 
forth in detail exactly how line diagrams should be constructed, there are 
certain rather obvious considerations of importance. The student who 
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IS interested in going into more detail in regard to the technique of chart 
construction is referred to the books listed at the end of this chapter. 

fiATE PER !000 



Chart 11. Death Rates per 1,000 Population in the Registration Area of the United 
States, 1912-1936. Data available just before going to press give the final rate for 
1937 as 11.2 and the pro visional rate for 1938 as 10.7. (Data from Statistical Abstract 
cf the United States, 19$7, p. 80^ and Bureau of the Census, Vital Stahstics, Special 
Reports, Vol 4, No. 54 'l 
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Zero on vertical scale. The inclusion of a zero on the vertical scale of a 
curve is perhaps one of the most important rules. Chart makers occa- 
sionally neglect to observe this principle and the result is always misleading, 
since the visual impression is incorrect. In Chart 11, death rates in the 
United States from 1912 to 1936 have been plotted with reference to a ver- 
tical scale beginning with zero. The same series of data appear in Chart 
12, but on this chart the vertical scale begins at 10. Chart 12 gives the 
reader a visual impression which is quite contrary to the facts. The death 

RATE PER JOOO 
20 


18 


16 


14 


12 


10 

1912 1916 1920 1924 1928 1932 1936 

Chart 12. Death Rates per 1,000 Population in the Registration Area of the United 
States, 1912-1936. This chart is incorrectly drawn, since the vertical scale begins 
with 10 and no clear indication of the omission of the zero is given. (Data from same 
sources as Chart 11.) 

rate in 1918 appears to have been about twice that for 1917, whereas 
Chart 11 shows that it was actually about 1| times as large. Again, in 
Chart 12 the death rate for 1928 appears to have been about IJ times that for 
the previous year, whereas from the preceding chart it is clear that the 
1928 rate was really not much greater than for 1927. Very few readers 
notice the omission of zero on a vertical scale, and fewer still are apt to 
make due allowance for the omission in interpreting a curve. It should 
not be necessary for a reader to refer to a scale in order to make approxi- 
mate comparisons; the chart should be so drawn that comparisons may be 
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made visually and as quickly as possible. Chart 12 also gives the impres- 
sion that death rates have approached very closely to some sort of absolute 
lower limit. 

There are several ways in which it is possible to show the zero (or to 
indicate clearly its omission), and also to avoid placing the curve high up 
on the chart. Chart 13 shows a method in which a definite break is made 
across the chart. Sometimes the parallel lines are serrated (notched) in- 



1912 1916 1920 1924 1928 1932 1936 

Chart 13. Death Rates per 1,000 Population in the Registration Area of the United 
States, 1912-1 936, (Data from same sources as Chart 11.) 

stead of wavy. They may be drawn freehand or, as in Chart 13, by mak- 
ing use of a bread knife as a ruler. Charts 6, 8, 14, and 15 show other 
devices which are occasionally used. Notice that Charts 6, 8, and 13 
show the zero and a scale break, while Charts 14 and 15 do not show 
the zero but merely call attention to the fact that the vertical scale is 
incomplete. 

Chart 16 appeared in the October 26, 1934, issue of Railroad Data and 
was also used as part of an exhibit in a hearing before the Interstate Com- 
merce Commission- The accompanying text indicated that the vertical 
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scale values were ^ ^pounds of coal required to move 1,000 gross ton miles of 
freight. Attention was directed to the increasing economy in the use of 
coal, but no warning was given as to the omission of the zero. The result 
is a misleading visual impression of a very marked increase in economy. 
It might, of course, be argued that this chart should not have zero for a 
base line, but rather a figure representing the minimum number of pounds 
of coal theoretically required to move 1,000 gross ton miles of freight under 


RATE PER rOOO 



Chart 14, Death Rates per 1,000 Population in the Registration Area of the United 
States, 1912-1936. (Data from same sources as Chart 11.) 

ideal conditions. Nevertheless, the chart as it stands is subject to pos- 
sible misinterpretation. 

Occasionally curves will be seen lacking a zero on the vertical scale and 
purporting to show the growth of sales of a commodity or of membership in 
an organization. The omission of the zero makes the growth appear to be 
much more rapid than it really has been. 

Chart 17 shows index numbers of the retail prices of food. This chart 
is unusual in two respects. In the first place it carries a zero for the ver- 
tical scale, which, though not wrong, is not necessary when price index 
numbers are being plotted, because it is hardly conceivable that prices 
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RATE PER 1000 



Chart 15. Death Rates per 1,000 Population in the Registration Area of the Unite 
States, 1912-1936. (Data from same sources as Chart 11,) 



Chart 16, Pounds of Coal Consumed per 1,000 Gross Ton-Miles of Freight Carried 
b7 Class I Railroads in the United States, 1920-1933. The vertical scale values are 
pounds of coal required to move 1,000 gross ton-miles of freight, all fuels used having 
been expressed in equivalent pounds of coal. (From Railroad T>ata, October 26, 1934.) 
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will ever approach zero and because the base 100 is the basis of comparison. 
The 100 line should always be emphasized when it is the base, as in this 
chart. Similarly the zero line should be emphasized, as in Chart 13, when 
it is the base of the chart. When charting index numbers, some persons 
prefer to show the fluctuations above and below the base in terms of posi- 
tive and negative values. In the case of Chart 17, 100 would become 
zero, 125 would become +25, and 75 would become —25. The vertical 
scale of Chart 17 would be altered to read +75, +50, +25, 0, -25, -50, 
— 75, —100. The curve itself would remain unchanged. The second 
unusual feature of Chart 17 is the treatment of the horizontal and vertical 
guide lines, which results in giving the curve an miusually clear profile. 



Chart 17. Retail Cost of Food in the United States, 1919-1935, 1923-1925 ==* lOO. 

(From Monthly Labor Review j February 1936, p. 495.) 

Notice also that space has been left to add later data. This practice allows 
the same original chart to be reproduced month after month by merely 
extending the curve as new data become available. 

Ruling curves. The curve or curves representing the data should stand 
out clearly from the background of the chart. The curve should therefore 
be ruled more heavily than the coordinates. (W^hen two or more curves 
are shovm which follow each other closely, it is sometimes necessary to 
use lightly ruled lines for the curves. See, for example. Chart 187, p. 507.) 
As will be seen from the various charts, the plotted points are not usually 
shovm since the attempt is to show the general situation rather than the 
individual readings. 

"When several curves are drawn on the same axis, it is essential that 
each curve stand out clearly. Thus we may use solid, dotted, and dashed 
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lines, and we may use heavy and light lines. If a light line is used for 
a curve it should ordinarily not be so light as the coordinates. The sug- 
gested rulings are listed below as A and B. 


A 


A. These lines are 
recommended if not 
more than three ctirves 
are to be drawn. 


B 


B, If more than 
three ctirves are to be 
drawn, these lighter 
lines may be used. 


C 



C. These lines are 
not recommended un- 
less plotted points are 
to be indicated by 
means of the circles or 
dots. 


When two or more curves appear on a chart, each should be clearly 
identified. This may be accomplished, preferably, by labeling the curves 
as in Chart 28, or by means of a legend as in Chart 23. 

It is ordinarily well to avoid the use of more than two or three curves 
on one chart. Particularly if they cross and re-cross, confusion is likely 
to result. When several curves appear on a large wall chart which is to be 
presented to a group, different colors may occasionally be used, though it is 
usually better practice to reserve the use of color for those occasions when 
special emphasis is to be placed on one or two curves. Black, red, green, 
light or medium blue, and medium or dark orange are readily distinguished 
If there is a likelihood that the wall chart is to be photostated, photo- 
graphed, or reproduced for printing, black and red may be used in solid 
and broken, light and heavy, combinations since the red line will reproduce 
as black. Blue, yellow, and some shades of green photograph either not 
at all or faintly. 

Coordinates. Chart makers emphasize the zero line by making it a 
little heavier than the other marginal lines. In similar fashion a 100 per 
cent line (or other base with which comparisons are made) may be stressed. 
The marginal vertical and horizontal lines may be made slightly heavier 
than the other coordinate lines. 

The coordinate lines should be drawn very lightly. No more coordinate 
lines should appear than are necessary to assist in reading the chart. 
Occasionally all coordinates are omitted, as in Chart 5. If it is desired 
to have a closely ruled grid in order to make plotting easy, an effaceable 
ruling may be had. The coordinates of this paper may be washed off aftei 
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they have served their purpose.^ When a chart is to be reproduced, a 
closely ruled grid of light blue may be used. The lines which should 
appear in the reproduction are ruled in black. The blue lines of the back- 


ground do not show up in the repro- 
duction under ordinary conditions. 
Most of the charts in this text were 
drawn on such a light blue back- 
ground. 

In order to insure a proper under- 
standing of a chart, the two scales 
should be clearly labeled. Not only 
should the nature of the variable be 
indicated, but the unit used should 
also be stated. Note, for example, in 
Chart 4 the horizontal axis shows in- 
comes, the unit being thousands of 
dollars. Chart 16, however, has no 
label on the vertical scale ; it is neces- 
sary to read the accompan 3 dng text 
in order to understand the chart. 
Occasionally a curve of a long time 
series may be rather extended hori- 
zontally. In such instances it is some- 
times desirable to repeat the vertical 
scale at the right of the chart, as in 
Chart 17. 

Chart proportions. It is hardly 
possible to give an objective rule as to 
the proper proportions for a curve 
diagram. It should be noted, how- 
ever, that bizarre impressions result 
from over-expanding or over-con- 
tracting either scale used for a curve. 
In Chart 18 the vertical scale is ex- 
aggerated in relation to the horizontal 
scale; in Chart 19 the horizontal scale 
is exaggerated. The former gives an 
impression of tremendous fluctua- 
tions; the latter conveys the idea that 
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Chart 18. Xndividiial Federal In- 
come Taxes in the United States, 
1913-1936. The vertical scale is ex- 
aggerated. The use of 18, ’23, etc. 
is ordinarily to be avoided. (Data 
from same source as Chart 3.) 


4 From Carl ScMeicher and Schull, 167 East 33rd Street, New York City. Smce no 
inV is entirely waterproof, best r^ults will be obtained by drawing the chart in hard 
pencil, washing off the coordinates, and then inking in the lines for the finished chart. 
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Chart 19. Individual Federal Income Taxes in the United States, 1913-1936. The 
horizontal scale is exaggerated. (Data from same sources as Chart 3.) 
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Chari 20» Population of the United States, ITPO-IPSO. (Data from Fifteenth Census 
cf the United States^ WSO^ Population Volurae 1, pp. 10-11.) 

88 




Chap. 4] GRAPHIC PRESENTATION: SIMPLE CURVES 


89 


POPULATION 
IN MILLIONS 

140 


120 


100 


the level of individual income tax payments has undergone relatively un- 
important fluctuations. These two charts indicate distorted results of 
replotting the data shown properly in 
Chart 3. Rules of thumb are often im- 
satisfactory because they are apt to be 
adopted blindly. However, it has been 
suggested that the proper proportions are 
those which result in a 45 degree angle for 
the movements of the curve which are to be 
emphasized. 

Chart 20 shows the population of the 
United States from 1790-1930. Charts 21 
and 22 show the same data; however, Chart 
21 gives a visual impression of rapid growth, 
while Chart 22 gives a visual impression of 
slow growth. A chart could, of course, be 
made to show amount of growth (or per 
cent of growth) for each census in relation 
to the preceding. In tliis case the vertical 
scale would show ^ ^amount of increase over 
preceding census^^ (or ^^per cent of increase 
over precedmg census^ 0* 

Although the two charts just referred to 
are curves of time series, it should be under- 
stood that the same false impressions are 
given by frequency curves if one scale is 
over-expanded. 

Lettering. Ail lettering on a chart, in- 
cluding scale labels, scale values, legend, 
curve labels, and any other words or figures 
should be placed horizontally, if possible. 

Occasionally space limitations may necessi- 
tate placing the vertical scale label in a 
vertical position, as in Chart 23, but such a 
limitation is not often present. Needless 
to say, all lettering should be legible. Free- 
hand words and figures may be made very 
attractive when executed by a skilled per- 
son (see, for example, Chart 135). The 
amateur, may however, make excellent 
formal letters and figures (with a little prac- 
tice) by the use of stencil lettering devices 



Chart 21. Population of the 
United States, 1790-1930. The 
horizontal scale is poorly chosen in 
respect to the vertical scale. It is 
usually better to avoid writing the 
horizontal scale values in the man- 
ner shown here. (Data from same 
source as Chart 20.) 
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available from artists^ or draftsmen's supply houses. Nearly all of the 
charts in this text, except those reproduced from other publications, were 
lettered by means of such devices. The lettering of Charts 39 and 62 and 
the inserts on a number of other charts (for example. Charts 4, 88, 89, 90, 
91, and several in Chapter XII) was done by means of a Vari-Typer com- 
posing machine, which is essentially a typewriter using many different kinds 
of type.^ 

POPULATION 

IN MILLIONS 



1790 1810 1830 1850 !870 1890 1910 1930 


Chart 22. Population of the United States, 1790-1930, The vertical scale is poorly 
chosen in respect to the horizontal scale. (Data from same source as Chart 20.) 



^ The stencils are available from Keuf el and Esser Co., 127 Fulton Street, New York 
City, and from Wood-Eegaii Instrument Co., Nutley, New Jersey. The Vari-Typer is 
sold by the Ralph C. Coxhead Corjioration, 17 Paik Place, New York City. 
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Title. Each chart, like each table, should have a title, which should 
state clearly and succinctly what the chart purports to show. The title 
of a printed chart may appear either above or below the chart, but is 
usually below. The titles of large wall charts are often placed above the 
grid. 

Source. Again, as in the case of a table, each chart should contain a 
source reference to indicate the author, title, volume, page, publisher, and 
date of the publication from which the data were taken. Naturally the 
cautions regarding comparability of data taken from the same source 
or different sources, mentioned in Chapter II, apply with full force to the 
figures used for making charts. 

Line Diagrams for Special Purposes 

Net balance charts. Chart 5 shows one method of indicating the net 
total of two series. For each of the years, total emigration was subtracted 
from total immigration and the result plotted as a positive or negative 
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Chart 24. Cleveland Trust Company Index of Business Activity in the United 
States, 1919-1938. The complete series runs from 1790 to date. (Data from various 
issues of The Cleveland Trust Company, Business Bulletin.) 

figure. The balance of trade (value of exports minus value of imports) 
may be shown in the same manner, as may also profit and loss. An alter- 
nate method of showing the migration data is illustrated in Chart 23. 
Here the curves of innnigration and of emigration are given; and the excess 
of immigration is indicated by the height of the shaded area, while the 
excess of emigration is shown by the height of the black portion. 

Silhouette charts. Chart 23 (referred to in the preceding paragraph) il- 
lustrates not only the showing of net amounts rather than gross amounts, 
but likewise the practice df shading the area between the curves in order to 
obtain emphasis. Chart 24 is similar to Chart 5 in that it shows fluctua- 
tions above and below a base line. In Chart 24, however, the areas of the 
curve have been emphasized by filling in with black. The result is a more 
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striking portrayal of the ^^plus” and parts of the curve. A chart 

of this type is even more effective when the ^^plus^^ areas are filled in with 
black and the ^ ‘minus’^ areas are filled in with red. Another example of a 
silhouette chart is shown by Chart 255; page 815. 

Maximum variation charts. The Library of Columbia University dis- 
played in an illuminated glass case a number of valuable old prints. For 
the proper preservation of the prints it was desired to maintain the tem- 
perature between 70 and 80 degrees Fahrenheit. The problem consisted 
of adjusting radiation of heat from the case, ventilation and conduction, 
and the proximity to nearby radiators so that the temperature inside the 
case would remain within the desired limits. A recording thermometer 
was placed in the case and the temperature was continuously recorded over 



Chart 25. Temperature Fluctuations in a Library Display Case. Temperature is in 
degrees Fahrenheit. The curved ordinates are made to correspond to the arc described 
by the recording pen of the thermometer. (From the library of Columbia University.) 

an extended period. In Chart 25 a four-day section of one of the charts 
is shown. During these days there was no heat in the adjoining radiator, 
and it may be seen that the temperature never fell below 70 degrees but 
did slightly exceed 80 degrees on several occasions. On Thursday, Friday, 
and Saturday the library was open to the public from 8 a.m. to 10 p.m.; 
on Sunday, from 2 to 6 p.m. The dashed lines have been added by the 
authors and serve to stress the limits beyond which the temperature should 
not fluctuate. 

Range charts. Chart 26 shows a device by means of which the range of 
stock prices may be depicted. It will be noticed that the black band 
expands when the range is greater and contracts^hen the range is smaller. 
The white line indicates the closing price. An alternate method of show- 
ing the same figures is illustrated in Chart 27. Here the top of each bar 
represents the high for the day, while the bottom of each bar represents 
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Chart 26, High, Low, and Closing Prices of 100 Stocks as Shown by the New York 
Herald Tribune Averages, November and December, 1938. Data are plotted for only 
those days during which the stock exchange was open. (Data from New York Herald 
Tribune ) 

the low for the day. The line connecting the bars represents the closing 
price. Charts such as these naay be used for showing commodity prices 
and other sorts of data if it is desired to show a range of variation over a 
period of time. 

Z charts. The Z chart consists of three curves on the same axes as shown 
in Chart 28. Usually the chart covers a period of one year, by months. 

DOLLARS 



Herald Tribune Averages, November and December, 1938. Data are pioxted for only 
those days during which the stock exchange was open. (Data from New York Herald 
Tribune,) 
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One curve shows the monthly figures, another shows the cumulative figures 
from the beginning of the year, while the third shows the total for the 
twelve months ending with each month. This last curve is generally 
called the moving annual total curve; more specifically it is a 12-jxionth 
moving total for the twelve months ending with each specified month. 
Two vertical scales are used with the Z chart since, if the monthly data were 
plotted against the same scale as the other data, the fluctuations of the 
monthly data would not be apparent. The Z chart is often used for in- 
ternal business purposes, showing, for example, data of production and 
sales. There is no reason why it should not be more widely used for show- 
ing other types of data such as those shown in Chart 28. It is, of course, 
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Chart 28. Intake of Cases New to a Family Service Agency: Monthly, Cumulative, 
and Moving Annual Total, 1937. (Data from Welfare Council of New York City.) 


limited to those situations in which the chart maker is interested in visual- 
izing: (1) the figure for a given month, (2) the figure for each month for 
that part of the calendar (or fiscal) year which has elapsed, and (3) the 
figure for the twelve months ending with each given month. 

Except for special purposes such as this, it is not^sually desirable to 
use two, or more, vertical scales (sometimes referred to as ^'multiple scales”) 
on a chart of the type described in this chapter. The occurrences of fluC' 
tuations (but not their magnitudes) in two series expressed in different 
units may occasionally be compared on a chart having two different vertical 
scales. However, the use of two, or more, different vertical scales is likely 
to give false visual impressions of the comparative magnitudes of changes 
occurring in the various series. 
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Varying horizontal scale charts. Occasionally it is desired to show 
annual data over a number of years, and monthly data for one or two more 
recent years. This may be done as in Chart 29, in which the horizontal 
scale is expanded to show the monthly data in more detail. Notice that 
the two parts of the chart are separated by a break. Similarly, a change 
in horizontal scale may be in order if we wish to show a combination of 
annual or monthly data with weekly data, or a combination of annual, 
monthly, or weekly data with daily data. 

Multiple axis charts. Occasionally it is desirable to compare the fluc- 
tuations of several curves and yet to have each curve stand out clearly. 
A simple method of accomplishing this result is to plot the different 

INDEX 



Chart 29. Federal Reserve Board Index of Industrial Production in the United 
States. Annually 1919-1936 and Monthly 1937 and 1938. (Data from Federal Re* 
serve Bulletin^ January 1939, p. 62, and press releases; 1923-1925 = 100.) 

curves along different horizontal axes, these different X-axes being ar- 
bitrarily separated by convenient vertical distances. An illustration is 
Chart 170. Here the different curves have been brought close together 
for ease of comparison, but there is no crossing of the lines. Although 
different horizontal axes are employed, the vertical and horizontal scales 
remain the same. In interpreting such a chart on arithmetic graph 
paper (as distinguished from semi-logarithmic graph paper described 
in the following chapter), it should be remembered that the comparison 
afforded is that of absolute and not of relative changes. It is unlikely that 
the use of this type of chart will be found desirable for presentation to the 
general reader, unless the diagram is accompanied by a clear explanation. 

Component part charts. Chart 30 shows the number of persons in the 
United States at each census from 1850 to 1930, in each of four general agd 
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Chart 30. Population of the United States in Each Specified Age Group, 1850-1930. 
(Data from Fifteenth Census of the United States, 19S0, Population Volume II, p. 576.) 
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Chart 31. Proportion of the Population of the United States in Each Specified Age 
Group, 1850-1930. (Data from Fifteenth Census of the United States, 19S0, Population 
Volume II, p. 576 ) 
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groups. The width of each band indicates the number of each age in the 
country at a given census. It is possible to observe, from this type of 
chart, whether or not a given group is increasing or decreasing, and whether 
or not the total of all groups is increasing or decreasing. The relative 
importance of a particular group camiot be visualized from Chart 30, but 
in Chart 31 the age groups are shown according to the proportions which 
they constitute of the total population Here it may be clearly seen that 
there has been a decrease in the proportion of younger persons and an in- 
crease in the proportion of older persons in the population. When com- 
ponent part data covering a few years are to be shown graphically, a bar 



Chart 32. Total, Active, and Inactive Cotton Spindles in the New England and 
Southern Cotton Growing States, 1906-1938. (From Monthly Labor Remw, December 
1938, p. 1245.) 

chart such as Chart 71 or 72 may be used. When a number of years are 
to be shown, the general trend can be more easily pictured by curves. 

Chart 32 is another illustration of a component part chart. In this 
chart one part, inactive cotton spindles, is emphasized by shading. The 
data of total spindles and of active spindles are plotted above the zero line. 
The difference between these two is the shaded area, inactive spindles. 

Frequency distribution and range chart. In 1928 the United States 
Personnel Classification Board made a study of the white-collar workers 
employed by the Federal Government outside of Washington and exclusive 
of postal employees. One phase of the investigation involved the collec- 
tion of data concerning the salaries paid in comparable occupations in pri- 
vate employment. Chart 33 shows a method by which comparison was 
made between the distribution of salaries paid to one group of employees in 
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private concerns (referred to as ^'general industry^’ on the chart) and the 
salary range for similarly qualified persons employed by the government. 
The source from which this chart was taken shows a number of such dia- 
grams, for various occupations and occupation groups. In some instances 
the government was paying more than the usual rate in private employ- 
ment ; in some instances the government rate was below the going private 
rate; in some instances there was substantial agreement; in every chart 
shown, however, the range of salaries paid by the government was nar- 
rower than in private employment. This may reflect a more exact classifi- 
cation of employees than in the aggregate of private concerns studied 


Stnibep 



Chart 33. Annual Salaries Paid to Employees in General Industry Corresponding 
to the CAF-1 Classification and Government Field Service Range of CAF-1 Salaries, 
1928. The CAF-1 grade is the lowest grade of worker in the ^'clerical, administrative, 
and fiscal service.’’ It includes adding machine operators, duplicating machine oper- 
ators, addressing machine operators, file clerks, punch card operators, routine typists, 
and so forth. (Reproduced from Umted States Personnel Classification Board, Report 
of Wage and Personnel Study ^ House Document No. 602, 70th Congress, 2nd Session, 
p 198.) 

Instead of comparing a frequency curve of private salaries with a range 
for government salaries, as in Chart 33, a comparison of two frequency 
curves could have been shown. Tliis sort of diagram is discussed in 
Chapter VIII. It possesses the advantage of showing not only the range 
of the second set of data, but also its distribution within that range. 
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CHAPTER V 

GRAPHIC PRESENTATION 

THE SEMI-LOGARITHMIC OR RATIO CHART 


Absolute and Relative Growth 

When considering the development of a series of statistical data over a 
period of time, we are sometimes interested in the amount of growth that 
has taken place, but more often we wish to know something about the 
relative growth or rate of change.^ Diagrams such as Charts 3, 8, and 
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Chart 34. An Arithmetic Progression (A) and a Geometric Progression (B) Plotted on 
an Arithmetic Grid. (Data of Tables 16 and 17.) 

^ The ieimt« ‘‘'relative’^ and “rate^’ are susceptible of various meanings. In this book, 
“relative change’’ (increase or decrease) and “rate of change” (increase or decrease) 
are used to mean change in relation to the value of the same series at a preceding date. 
This is sometimes referred to as “rate per cent of change” or “percentage rate of change. ’’ 
The change would be “per annum” if the comparisons were yearly. 

100 
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various others in Chapter IV are of the familiar type, having what are 
termed arithmetic scales, and are of use, primarily, for indicating absolute 
changes in the factor shown on the Y-axis. It is the purpose of this dis- 
cussion to explain a slightly different sort of grid which enables one to 
visualize the relative change that is taking place in a plotted series. 

The ability of the usual type of chart to give a satisfactory visual im- 
pression of absolute, but not of relative, changes is brought out by Chart 
34. Curve A represents a constant amount of increase of 200 units per 
year (see Table 16), and this, or any other, arithmetic 'progression (constant 
amount of increase or decrease) will be depicted by a straight line w^hen 
plotted on the conventional or arithmetic grid. Curve R, however, is 
the result of plotting a series of figures which begin with 128 and increase 
50 per cent each year (see Table 17). It will be noticed that this curve 
is not a straight line; the curve bends upward more and more sharply as 
time passes. 

TABLE 16 

Ajs’ Arithmetic Progression 


Year 
(X value) 

Y value 

Amount 
of mcrease 

1931 

0 


1932 

200 

200 

1933 

400 

200 

1934 

600 

200 

1935 

800 1 

200 

1936 

1,000 1 

200 

1937 

1,200 

200 

1938 

1,400 

200 


TABLE 17 

A Geometric Progression 


Year 
(X value) 

Y value 

Per cent 
of mcrease 

1931 

128 


1932 

192 

50 

1933 

288 

50 

1934 

432 

50 

1935 

648 

50 

1936 

972 

50 

1937 

1,458 

50 

1938 

2,187 

50 


A series showing a constant rate of increase or decrease is known as a 
geometric progression, and any geometric progression will yield a curved 
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line when plotted on an arithmetic grid.^ A geometric increase is repre- 
sented by a curve which slopes upward and is concave upward, as in Curve 
B of Chart 34; a geometric decrease is represented by a curve which slopes 
downward and is concave upward. A serious diflSculty in interpreting 
such curves, however, lies in the fact that the eye cannot discern whether 
or not a particular curved line does or does not represent a constant rate 
of change. Chart 35 depicts a series which is neither an arithmetic nor a 
geometric progression. The data of Table 18 show that the series increases 
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Chart 35. A Series Increasing by Increasing Amotints. This series is not a geo- 
metric progression, but may give that visual impression. (Data of Table 18.) 

more rapidly than an arithmetic progression, and the eye can grasp this 
fact because the curve bends upward. The table also indicates that the 
series increases at a rate which is not constant. Visually, however, this 
fact is not apparent. It is not possible for the reader of an arithmetic 
chart to be sure whether a curved line such as this represents a constant 
rate of increase, a rate of increase which is diminishing, or a rate of increase 
which is accelerating. Any series of figures that increases more rapidly 

2 A curve representing a geometric progression is termed an “exponential curve” and 
is indicated by the equation Y = ab^. The reader may be familiar with this equation 
in the form Pn == Po(l -f- r)", which is the compound interest equation and is discussed 
on pp. 225-226. A straight line representing an arithmetic progression is indicated 
by F = a -f 6X 
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than an arithmetic progression (for example, 10, 12, 15, 19, 24, 30) slopes 
upward and is concave upward when plotted on an arithmetic grid; any 
series of figures that decreases less rapidly than an arithmetic progression 
(for example, 100, 91, 83, 76, 70, 65) slopes downward and is concave up- 
ward when shown on arithmetic coordinates. 

TABLE 18 

A Series op Increasing Val-ctes 


Year 
{X value) 

Y value 

Amount 
of increase 

Per cent 
of increase 

1931 

50 

. 


1932 

80 

30 

60.0 

1933 

160 

80 

100.0 

1934 

300 1 

140 

87,5 

1935 

550 

250 

83.3 

1936 

1,080 

530 

96.4 

1937 

1,730 

650 

60.2 

1938 

2,500 

770 

44.5 


Before proceeding to develop the basis for the semi-logarithmic or ratio 
grid, which will enable us to visualize rates of change, let us examine fur- 
ther the arithmetic grid. Chart 36 shows the growth of motor vehicle 
registrations in the United States and in Canada from 1917 to 1938. We 
can see from this chart that registrations in the United States increased 
rapidly and, apparently, in approximately an arithmetic progression from 
1917 to 1929; held fairly constant from 1929 to 1930; dropped in 1931, 
1932, and 1933; and resumed the upward movement from 1934 to 1937, 
only to fall slightly in 1938. Changes in registration in Canada are 
difiSlcult to see because the scale which must be used to accommodate the 
United States causes the curve for Canada to fall rather close to the 
base line. However, it appears that registrations in Canada increased 
from 1917 to 1930; decreased in 1931, 1932, and 1933; and increased in the 
five following years. It is quite obvious that from 1917 to 1929 the amount 
of increase each year was greater for the United States than for Canada, 
but there is no way of knowing from the appearance of the curves which 
country had the greater relative increase. 

It would not do to replot the data of Chart 36 by using one vertical 
scale for the United States and another for Canada, in order to magnify 
the movements of the curve for the latter. The fact that one curve is below 
another on an arithmetic grid tells us at a glance that the lower curve repre- 
sents a series of smaller magnitude than does the upper. If two vertical 
scales are used, we have really two distinct, non-comparable charts, and 
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no satisfactory visual comparisons may be made in respect to (1) the size 
of the two series plotted, (2) the amount of change which has taken place 
hi one series in comparison with the amount of change in the other, or (3) 
the rates of change of the two series. 

A Grid to Show Rates of Change 

Prom what has already been said it must be obvious that graphic com- 
parisons in respect to rates of change will be facilitated if we can employ 
MILLIONS OF VEHICLES 
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a sort of grid which will make a constant rate of increase appear as a 
straight line. In Table 19 the geometric progression of Table 17 and Chart 
34 is again shown and with it are given the logarithms of the various num- 
bers. Examination of these logarithms reveals that they form an arith- 
metic progression; therefore, if these logarithms are plotted on an arith- 
metic grid, a straight line will result, as may be seen in Chart 37. This is 

TABLE 19 

A Geombteic Pkogression and Logarithms op the 
Geometric Progression 


Year 
(X value) 

Y value 

Logarithm 

of 

Y value 

Amount of 
increase of 
logarithms 

1931 

128 

2.107210 


1932 

192 

2.283301 

.176091 

1933 

288 

2 459392 i 

.176091 

1934 

432 

2 635484 

.176092* 

1935 

648 

2.811575 

.176091 

1936 

972 

2 987666 

176091 

1937 

1,458 

3 163758 

.176092* 

1938 

2,187 

3.339849 

.176091 


* These values differ slightly because logarithms have been rounded 
to nearest millionth 
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Chart 37. Logarithms of a Geometric Progression Plotted on an Arithmetic Grid. 

(Bata of Table 19.) 
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one way of accomplishing our objective, but it involves the additional step 
of looking up logarithms before the data can be plotted. However, in- 
stead of plotting the logarithms of the values of a series, we use a grid which 
is designed with a logarithmic vertical scale, as in Chart 38. Here, again, 
we find that the geometric progression appears as a straight line. A grid 
of this type is termed semi-loganthmic because one scale is logarithmic and 
the other is arithmetic. Because of the assistance which this type of ruling 
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Chart 38. A Geometric Progression Plotted on a Semi-Logarithmic or Ratio Grid. 
Printed semi-logarithimc forms have intermediate rulings more closely spaced than 
those shown in this chart These closely spaced hnes are an aid to plotting but are 
omitted from most of the charts in this book since reduction to fit the size of the page 
results in bringing these lines very close together. The detailed ruling is shown in 
Chart 52. (Data of Table 17.) 

renders in comparing rates and ratios, it is frequently called ratio ruling. 

How it is made. The construction of the logarithmic scale merely in- 
volves spacing the vertical scale values in proportion to the differences 
between their logarithms. Referring to Chart 39, it will be found that the 
distance from 2 to 3 on the scale is .352 inch, and from 3 to 4 is .250 inch. 
We then have: 


log 3 — log 2 .352 inch 
log 4 ~ log 3 .250 inch 
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ylATURAL 

HUMBER 


LOGARITHM 
I .000 
.954 
.903 
.845 
.778 


.602 


. 158 * 

194 * 


352 * 


.477 - .301 _ .352 inch 
.602 - .477 ~ .250 inch' 

and the proportion is: 

.176 : .125 : : .352 inch : .250 inch. 

An alternative approach to an understanding of the logarithmic scale 
does not involve any reference to logarithms. Reference to Chart 34 will 
recall that equal distances on the vertical scale of an arithmetic grid repre- 
sent equal amounts. Equal distances measured 
along a logarithmic scale, however, represent equal 
ratios. On the vertical scale of Chart 38 it may be 
seen that the distance from 100 to 200 is .42 inch; 
likewise the distance from 300 to 600 is .42 inch. 

Measurement will reveal that any two numbers of 
ratio 1 : 2 are separated by .42 inch on this scale. 

On this same scale the distance from 200 to 800 is 
84 inch, and it follows that any two numbers of 
ratio 1 : 4 will be separated by .84 inch. Thus w^e 
see why the semi-logarithmic chart is frequently 
termed the ratio chart. 

The logarithmic scale. The vertical scale of 
Chart 38 is divided into two parts which are gen- 
erally referred to as cycles or phases. We therefore 
refer to the paper on which Chart 38 was drawn 
as ^ ^two-cycle (or two-phase) semi-logarithmic 
paper.^^ In labeling the vertical scale of a semi- 
logarithmic chart, we may begin with any positive 
value. The figure at the top of the first cycle will 
be ten times that at the bottom of the cycle; the 
figure at the top of the second cycle will be ten 
times the figure at the bottom of the second cycle 
(the top of the first cycle) ; and so on.^ In Chart 40 there are illustrated 
eight different logarithmic scales beginning with .1, 1, 2, 5, 10, 17, 25, 
and 50 respectively. Although it is mathematically permissible to begin 
a logarithmic scale with any positive value, it is advisable to select a scale 
which will allow interpolations of intermediate values to be made readily. 
The scale beginning with 17 would be very difiScult to use. If it were 
desired to have a three-cycle scale beginning with .5, the various values 


Chart 39. The Log- 
arithmic Scale. The 
vertical distances are 
proportional to the dif- 
ferences between the 
loganthms. Each ver- 
tical distance is twice 
the difference between 
the logarithms meas- 
ured in inches. 


^ A common logarithm is the power to which 10 must be raised to produce a givei* 
number. Thus, 100 is 10^, and the logarithm of 100 is 2.0; 10,000 is 10^, and the 
logarithm of 10,000 is 4.0. 
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of the first scale could be multiplied by 5. Most ready-ruled semi- 
logarithmic paper carries along the right-hand edge of the grid such 
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1,000 
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40 


2,000 

J .600 

1.200 

800 

400 

200 
• 60 
120 

80 

40 

20 
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5.000 

4.000 

3.000 

2.000 

t,000 

500 

400 

300 

200 

100 

60 

40 

30 

20 

10 


10.000 cr- 
8.000 - 
6,000 - 

4.000 - 


2,000 

1.000 

800 

600 


100 

80 

60 

40 


10 


17.000 

13.600 

10.200 

€.800 


3.400 


1,700 - 
1,360 ^ 

l,020r 

680 


340 

(70 

136 

102 

68 

34 


25.000 

20.0001 

15,000| 

10,000 


5.000 

2,600 

2.000 

1.500 

1,000 

600 


250 ~ 
200 - 
150 - 
100 - 


25 


50.000 

40.000 

30.000 

20.000 


1 0,000 

5.000 

4.000 

3.000 

2.000 

1,000 

500 

400 

300 

200 


50 


Chart 40, Logarithmic Vertical Scales. The scale beginning with 17 would be difficult 

to use. 
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designations as those shown in Chart 52. These are multiplying fac- 
tors and indicate that the value to be written opposite each horizontal 
line on the left-hand scale must be the value at the bottom of that cycle 
multiplied by the figure shown opposite that horizontal line on the right- 
hand scale. 

If a logarithmic scale were begun with zero, the top of the first cycle 
would be 10 X 0 = 0, and all values on the scale would also be zero. Sup- 
pose that the uppermost value of a three-cycle logarithmic scale is .01. 
Then the bottom of the third cycle is of ,01, or .001 ; the bottom of the 
second cycle is .0001 ; and the bottom of the first cycle is .00001 . There can 
thus be no zero base line, and the semi-logarithmic chart does not permit 
interpietation of curves in terms of distances above a base line as does the 
arithmetic chart. Although plotted values may, of course, be read against 
the vertical logarithmic scale, no visual impression may be had of the 
absolute magnitudes plotted. The semi-logarithmic chart shows: (1) a 
constant rate of change as a straight line; (2) the rate of increase or decrease 
by the slope of the line; and (3) the comparison of rates or ratios between 
two or more lines by means of parallelism of these lines, or lack of it. 

Interpretation of curves. Before proceeding with a consideration of 
applications of the semi-logarithmic chart, we should give attention to 
Charts 41A and 41B and the comments below them. When two curves 
are parallel on semi-logarithmic paper (for example, a, a'; d, d'), we know 
that they have undergone the same rate of change and also that the ratio 
between the two has remained constant. Parallelism between curved 
lines is very difficult to judge with the eye. Reference to the lower sections 
of Chart 41A will show that the curved fines are always the same vertical 
distance apart and thus the two curves in each section are parallel with 
respect to the X axis. 


Applications 

Comparing rates of increase or decrease. Since there is no zero on the 
vertical scale of the semi-logarithmic chart and thus no base fine, and since 
equal vertical distances (on the same scale) always represent the same 
ratio, it is permissible to use two or more different vertical scales in order 
to bring curves of different magnitude close together for comparison. This 
has been done in Chart 42 which presents the data pf motor vehicle regis- 
trations previously shown on an arithmetic grid m Chart 36. Shifting the 
vertical scale of a semi-logarithmic chart moves the curve upward or down- 
ward, but the slope, which is of paramount importance, is not altered 
thereby. When using two logarithmic scales, as in Chart 42, it is highly 
desirable (though not absolutely necessary) to keep the series of smaller 
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Chart 41A. Curves on Arithmetic and Semi-Logarithmic Grids. The two curves ir 
each of the lower eight squares are equidistant vertically from each other. 

arithmetic Vertical Scales 
A, A' — Constant amounts of increase, same for both curves 
B ' — Different constant amounts of increase, greater for B 
C, C' — Different constant amounts of increase, greater for C'. 
jD, — Constant amounts of decrease, same for both curves 

E, E ' — Different constant amounts of decrease, greater for E- 
P, F ' — Different constant amounts of decrease, greater for F'. 

G, Q' — ^Amounts of increase increasing, same for both curves. 

JH, B ' — Amounts of increase decreasing, same for both curves. 

J, I ' — ^Amounts of decrease increasing, same for both curves, 
jr j jf — Amounts of decrease decreasing, same for both curves. 

Logarithmic Vertical Scales 
G, o' — Constant rates of increase, same for both curves 

h, V — Different constant rates of increase, greater for h. 

c, c ' — Different constant rates of increase, greater for c'. 

d, d* — Constant rates of decrease, «ame for both curves, 
c, e'— Different constant ra;,es of decrease, greater for e 

f ' — Different constant rates of decrease, greater for 
g, Rates of increase increasing, same for both curves 
A, B — Rates c-f increase decrea'-ing, same for both curves. 

i, i' — Rates of decrease increasing, same for both curves. 

,7. y'— Rates of decrease deci'easmg, same for both curves. 




ARITHMETIC 
VERTICAL SCALES 



LOGARITHMIC 
VERTICAL SCALES 



An arithmetic progression. 


A senes in which the absolute change is increas 
mg 

a If relative change is increasing, 
b If relative change is constant 
c If relative change is decreasing 



A, seiies in which the absolute change is decreas 
mg 



Two arithmetic progressions, same absolute 
changes. 



A geometric progression. 


A senes in which the relative change is increasing 


A senes m which the relative change is decreasing 
A K absolute change is increasing 
B. If absolute change is constant 
C If absolute change is decreasing 


Two geometric progressions, same relative 
changes 


Chart 41 B. Comparisons of Series of Various Types Plotted in Relation to Arithmetic 
and Logarithnaic Vertical Scales. Senes plotted as shown on one scale become as in- 
dicated on the other. The above comparisons refer to increasing senes only. It is sug 
gested that the reader sketch some comparisons involving declining senes. 


Ill 
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magnitude below that of greater magnitude ; likewise, if one or more com- 
ponents are being compared with a total, the curves for the components 
should be below that for the total. 

Chart 36 gave us no idea of the relative growth of automobile registra- 
tions in either the United States or in Canada. Chart 42, however, shows 
relative growth for each series and enables us to compare the rates of growth 
of these two series of dissimilar size. In general both series have shown 
about the same rates of increase and decrease throughout the period. The 


UNITED STATES CANADA 

MILLIONS OF THOUSANDS OF 

MOTOR VEHICLES MOTOR VEHICLES 



Chart 42, Motor Vehicle Registrations in the United States and Canada, 1917-1938. 
(Data from Automobile Manufacturers Association; 1938 registration for Canada is an 
estimate ) 


insert on Chart 42 makes it possible to estimate the rate of increase or 
decrease for the curves shown. It does not, however, apply to other charts 
which have different scales. 

An alternate method of showing the relative change in motor vehicle 
registrations in the United States and Canada consists of calculating the 
per cent of change for each year and plotting the results on an arithmetic 
grid. This has been done in Chart 43. 

Instead of comparing the rates of change of two different series over the 
same period of time, we may be interested in comparing rates of growth of 
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the same series at different times. Thus in Chart 42 we can see that the 
rate of increase of United States automobile registrations was greater from 
1935 to 1936 than from 1936 to 1937, and also that the rate of decline was 
greater from 1931 to 1932 than from 1932 to 1933. Similar conclusions 
may be drawn from Chart 43. 

It is frequently necessary to compare series which are expressed in 
different units. For example, we may compare any two or more of the 
following: commercial failures, in millions of dollars; volume of trading on 
a stock exchange, in number of shares traded; coal production, in 2,000- 
pound tons; petroleum production, in 42-gallon barrels; lumber production, 

PER CENT 
OF CHANGE 



) 1 ■ . 1 I I ! 1 ( 1 « I 1 1 I I 1 I 1 I 

1917 1920 1923 1926 1929 1932 1935 1938 


Chart 43. Anntial Per Cent of Increase or Decrease in Motor Kegistrations in the 
United States and in Canada, 1917-1938. (Data from same source as Chart 42.) 

in board feet; cement production, in 350-poimd barrels; electric power 
produced, in kilowatt hours; manufactured gas, in cubic feet. It is pos-' 
sible to reduce 350-pound barrels to tons, but it is not possible to change 
kilowatt hours to board feet, or vice versa. 

While it is possible to plot two series expressed in different units on an 
arithmetic grid, it is not often that such a comparison is useful. We are 
not likely to be interested in comparing the changes in electric power pro- 
duction in kilowatt hours with the changes in cement production in barrels. 
Rather are we apt to want to compare the percentage change in electric 
power production with the percentage change in cement production. On 
the semiTogarithmic grid there is no zero base line; only the slope of a 
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curve has meaning, and we are enabled to make a valid comparison 
of the relative changes in the two series expressed in such dissimilar units 
as those just mentioned. Chart 44 shows a comparison of the production 
of electric energy and of portland cement. Among other interesting com- 
parisons may be noted the more rapid relative growth in the production of 
electric energy from 1921 to 1929 and the relatively more severe decline 
in production of cement from 1929 to 1933. 

Comparing fluctuations. The comparison of the fluctuations taking 
place in two series of different size may be illustrated by reference to the 

ELECTRIC POWER PORTLAND CEMENT 

MILLIONS OF THOUSANDS 



1921 1923 1925 1927 1929 193 ! 1933 1935 1937 

Chart 44. Average Monthly Production of Electric Power and of Portland Cement 
1921-1937. (Data from Survey of Current Business , 1938 Supplement, p. 150 ) 

1902-1938 annual prices of pig iron and of finished steel, the latter being 
an average of quotations on steel bars, beams, tank plates, plain wire, open- 
iiearth rails, black pipe, and black sheets. The quotations for pig iron are 
given in dollars per long ton and varied from a low of $12.95 in 1914 to a. 
high of $42.76 in 1920. Finished steel, on the other hand, is quoted in 
terms of cents per pound, and the price varied from 1.433 cents in 1914 to 
4.191 cents in 1917. If a vertical scale were designed to accommodate the 
quotations for pig iron, it is easy to see that the steel price curve would 
virtually coincide with the base line. If a vertical scale were designed to 
fit the prices of steel, the curve for pig iron prices would be above the top 
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of any reasonable-sized piece of graph paper. Using a single arithmetic 
vertical scale, we cannot plot these two series on the same axes. 

However, we can transform the prices of finished steel to prices per ton. 
The results are shown in Chart 45. Here it is seen that the prices of fin- 
ished steel are higher than the prices of pig iron. We can also note the 
highs and lows for each series, and can observe that the absolute fluctuation 
in prices during 1914-1924 was less for pig iron. If we are interested in the 
severity of the relative fluctuations, we should examine the curves of Chart 
46. This semi-logarithmic chart shows clearly that the relative fluctua- 



1902 (908 19(4 1920 1926 1932 1938 

Chart 45. Price per Long Ton of Pig Iron and of Finished Steel. (Data from 
Standard Statistics Company, Basfic Statistics, 1936, pp. G6 and G12, and various issues 
of Current Statistics. Both price series are composites.) 

tions during 1914-1924 were about the same. Note that, while the abso- 
lute rise from 1919 to 1920 was greater for steel (Chart 45), the relative 
increase was greater for pig iron (Chart 46). Note also that a greater 
relative rise is shown for pig iron during the period 1932-1934. 

In order to plot these two series on the arithmetic chart, it was necessary 
to express steel prices on a per ton basis. Such an adjustment would not 
have been possible if we had been dealing with one series expressed in 
terms of tons and another in terms of yards (say, for example, iron and 
rayon). Although we used both price series in terms of dollars per ton 
for Chart 46, the semi-logarithmic chart does not impose such a require- 
ment. When we are interested only in the relative change which has 
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taken place, it does not matter if either the money unit or the physical 
unit or both dilfer for the two series. Chart 47 shows the relative fluctua- 
tions in the prices of pig iron (in dollars per ton) and of steel (in cents per 


DOLLARS PER 
LONG TON 



1902 1908 1914 1920 1926 1932 1938 


Chart 46. Price per Long Ton of Pig Iron and Finished Steel, 1902-1938. (Data 
from same source as Chart 45 ) 


pound) . Notice that the steel price curves of Charts 46 and 47, although 
the prices are in terms of different units, would coincide if superimposed 
upon each other. 


PIG IRON 
DOLLARS PER 
LONG TON 


STEEL 
CENTS PER 
POUND 
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Chart 47. Price of Pig Iron per Long Ton and of Finished Steel per Pound, 1902-1938. 
(Data from same source as Chart 45 ) 


Instead of being interested in two series, we may wish to compare the 
undulations of a single series which fluctuated around relatively small 
values during one period and around decidedly larger values at anothei 
time. For example, commercial failures were around $100,000,000 tc 
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$200,000,000 from 1895 to 1910. From 1921 to 1933, however, they ranged 
from $400,000,000 to $933,000,000. The semi-logarithmic chart enables 
us to study the relative severity of the fluctuations during such different 
periods. 

Showing ratios. Chart 48 shows how ratios may be presented on the 
semi-logarithmic chart. The two series plotted are the price per bushel 
received by farmers for corn, and the price per 100 pounds received by 
farmers for hogs. When com is bringing a price which is low in relation 


CORN HOGS 

DOLLARS PER DOLLARS PER 



Chart 48. Average Farm Prices of Com per Bushel and of Hogs per Hundred 
Pounds, 1934-1938. The supplementary scale enables us to read the ratio of hog 
prices to corn prices for any month. The value 11 is placed opposite the line for corn, 
and the value opposite the hog hne is the ratio of hog p-ices per 100 pounds to corn 
prices per bushel. For May 1937 the ratio is shown to be 7.7, which may be verified by 
referring to Chart 49. The supplementary scale is graduated the same as the scales 
at the sides of the chart. The figure 11 is placed opposite the corn Ime because the 
hog scale of the diagram shows values which are 11 times the corresponding values on 
the corn scale. (Data from various issues of Crops and Markets,) 

to the price of hogs, farmers will generally find it profitable to feed com to 
hogs rather than to sell the corn for cash. On the other hand, when com 
is briuging a price which is high in relation to that of hogs, farmers will 
tend to sell corn for (iash. If 100 pounds of hogs brings the farmer about 
11 times as much as a bushel of com, it is largely immaterial to the farmer 
whether he sells his corn for cash or feeds the corn to his hogs.^ For this 
reason the two scales of Chart 48 have been placed in an 11 to 1 ratio/ 
The chart shows not only the fluctuations in the price of hogs and the price 

^ See pp- 155-156, where the hog-corn ratio is discussed. 

® The scale for hog prices is awkward but is unavoidable in this instance. 
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of com, but also makes it easy to see when the price of 100 pounds of hogs 
is more than, less than, or exactly, 11 times the price of a bushel of corn. 
When 100 pounds of hogs is selling for more than 11 times as much as a 
bushel of corn, the curve for hogs is above the curve for corn, hogs are 
relatively valuable, and farmers tend to feed corn to their hogs. When 100 
pounds of hogs is selling for less than 11 times as much as a bushel of corn, 
the curve for hogs is below that for corn, corn is relatively valuable, and 
farmers tend to sell corn for cash. When the two curves are parallel, the 
ratio is remaining constant; when the com price curve is sloping upward 
more rapidly (or downward less rapidly) than the hog price curve, corn is 

HOG -CORK 

RATIO 



1934 1935 1936 1937 1938 


Chart 49. Hog-Com Ratio, 1934-1938. The ratio is obtained by dividing the aver- 
age farm price of hogs per 100 pounds by the average farm price of corn per bushel. 
(Data from Crops and Marhets, January 1939, p. 10 ) 

becoming more valuable in relation to hogs; when the corn price curve is 
sloping upward less rapidly (or downward more rapidly) than the hog price 
curve, corn is becoming less valuable in relation to hogs. The supple- 
mentary scale, shown on the chart, enables the reader to measure the ratio 
between the two price curves at any time. 

Chart 49 illustrates another method of showing the relationship between 
hog and com prices. Here the ratio of hog prices to corn prices has been 
computed for each month and plotted on an arithmetic grid. The ratio 
may be studied without the use of a supplementary scale, but changes in 
com prices and in hog prices are not shown. 

Inteipolation and extrapolation. While an interpolation on an arith- 
metic chart is an arithmetic interpolation, an interpolation on the semi- 
logarithmic chart is a logarithmic interpolation. Thus, if we refer to 
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Chart 38 and graphically interpolate for the Y value midway between 1935 
and 1936, we obtain about 790, which is approximately the same figure that 
we get if we use (log 648 + log 972) 2 and take the antidogarithm of 

the result. 

Extrapolation consists of extending the curve at one end or the other, 
When we extend the curve to estimate for later years than those for which 
we have data, we are forecasting. This application of the semi-logarithmic 
chart is decidedly of questionable value if it involves merely the extension 
of a curve which has indicated in the past that xhe data exhibit a fairly 
constant rate of increase. Any forecasting procedure which involves 


URBAN POPULATION 
IN MILLIONS 



Chart 50. Urban Population of the United States, 1880-1930, and Two Estimates 
for 1940. A dubious application of tho ^omi-lozarthnuc chart. (Data from Fifteenth 
Census of the United States, 1930. Pop’iUrior. Voli'iiu 1, p. 8. A slight change in tne 
classifications “urban” and “rural” took place in 1930. See p 46 of this text.) 

merely the continuation of a curve or the automatic application of a 
formula, without at the same time requiring a careful consideration of 
underlying and modifying factors, is hardly to be depended upon, particu- 
larly if economic conditions are in a state of flux. The curve of Chart 50 
shows the population of the United States classed as “urban’' at each census 
from 1880 to 1930 inclusive. While the extensions of the curve indicate 
two possible estimates for 1940, it should be realized that any estimate of 
urban population in 1940 based only on a knowledge of the six preceding 
censuses can have little validity. What of the subsistence farming move^ 
ment? What of decreased immigration? What of birth control? What 
of those who went back to live on the home farm in thr depression years? 
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Flexible Logarithmic Scales 

One logarithmic cycle will accommodate a ten-fold increase; two cycles 
make provision for a hundred-fold increase. Reference to the various 
charts included in this chapter will show that no vertical logarithmic scale 
extends over more than two cycles. Two-cycle semi-logarithmic paper 
will suffice for most series which the chart maker is likely to encounter; 
rarely will he need paper covering more than three cycles, since it allows 
for a thousand-fold increase. Even in cases where a series of very small 
magnitude must be compared with one of very large magnitude, a numbei 
of cycles is not needed, since it is desirable to use two vertical scales to 
bring the two curves together for comparison, as in Chart 42. Many sorts 
of ready-ruled semi-logarithmic paper are available from various sources. 
If, however, only two-cycle paper is available and paper having more 
cycles is needed, it is merely necessary to trim the lower margin from a 
sheet of two-cycle paper and paste it above another sheet. 

At times it may be desirable to use one- or two-cycle paper, but mth a 
larger or smaller size cycle than those which are readily available. Using 
an ordinary sheet of semi-logarithmic paper and placing a sheet of plain 
paper diagonally on top of it, a logarithmic scale may be expanded as shown 
in Chart 51. A logarithmic scale may be contracted by placing a sheet of 
semi-logarithmic paper diagonally on a piece of plain paper and ruling 
horizontal lines as shown in Chart 52. For those who have frequent occa- 
sion to use logarithmic scales of varying size, a device such as that shown 
in Chart 53 is useful.® The original of this chart provides a logarithmic 


Scale value 
1 
2 

3 

4 

5 

6 

7 

8 
9 

10 

20 

30 

40 

50 

60 

70 

80 

90 

100 


Logarithm 

0 

.301030 

.477121 

.602060 

.698970 

.778151 

.845098 

.903090 

.954243 

1.000000 

1.301030 

1.477121 

1.602060 

1.698970 

1.778151 

1.845098 

1.903090 

1.954243 

2.000000 


Difference 

.301030 

.176091 

.124939 

,096910 

.079181 

.066947 

.057992 

.051153 

.045757 

.301030 

.176091 

,124939 

096910 

.079181 

.066947 

.057992 

.051153 

,045757 


® Purchasable from Harriet Bdmunds, 202 East 44th Street, New York City. 
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cycle var 3 dng from 1-|- inches to 12 inches. Of course, any number of 
cycles may be built up on top of one another. 

In case no suitable logarithmic paper and no logarithmic scales of any 
sort are available, it is possible to construct a logarithmic scale of any 
desired size by referring to a table of logarithms. With scale values spaced 
in proportion to the differences between their logarithms, a scale may be 
constructed in terms of any convenient unit. From the figures on page 120 
it is seen that the distance from 1 to 2 would be .301030 units, the distance 
from 2 to 3 would be .176091 units, and so on. Intermediate values^are 
located similarly. 

The usefulness of logarithmic scales is not limited to the applications 
shown in this chapter. In Chapter XI we will make use of a horizontal 
logarithmic scale and an arithmetic vertical scale. In Chapters VIII and 
XXIII we wiU use logarithmic scales on both the horizontal and vertical 
axes. 
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CHAPTER VI 

GRAPHIC PRESENTATION 

OTHER TYPES OF CHARTS 


Not only may we use curves to present statistical information, but there 
are available a number of other graphic devices as well. In this chapter 
we shall give attention to bar charts, pie diagrams, pictorial charts, and 
statistical maps. 


Bases of Comparison 

Chart 54 shows how the number of tractors on farms may be compared 
by means of three types of diagrams: (A) a bar chart involving one-dimen- 
sional comparisons; (B) and (C) circles and squares, involving two- 
dimensional comparisons; and (D) a three-dimensional comparison repre- 
sented by tractors of varying sizes. Readers of charts obtain most accu- 
rate impressions of the magnitudes shown when data are represented by 
means of bar charts, and least accurate impressions when data are repre- 
sented by volume diagrams. Area diagrams are more accurately judged 
than volume diagrams, but less accurately than bar charts.^ It should 
also be remembered that volume diagrams shown on the printed page make 
it necessary for the reader to visualize the third dimension before making 
his comparison- Another disadvantage of charts using squares, circles, 
or pictures of different sizes is that the reader may be uncertain whether 
to compare heights, areas, or volumes. In any event the basis upon which 
the diagram was drawn should be indicated. If it is argued that the cor- 
rect basis of comparing the size of such objects as tractors is the apparent 
weight of the different tractors, but if the statistician has drawn the trac- 
tors so that the number of tractors in different years is shown by the height 
of the tractors, as is often done, then the reader who judges the sizes upon 
the basis of apparent weight (essentially volume) will get an exaggerated 
impression of the variation in number of tractors during the different years. 


1 See '^Graphic Comparisons by Bars, Squares, Circles, and Cubes,’' by Frederick E. 
Croxton and Harold Stem, J owmal of the American Statistical Assoc^at^on, March 1932, 
pp- 54-60. 
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1925 1930 1936 1937 1938 

D 

Char: 54. Nmnber of Tractors on Farms-, in the United States, 1925, 1930, and 
1036-1938. Eepresented by (A) bars, (B) circles, (C^ oquares, and (B) tractors. Parts 
B and C show the comparisons by areas, while part B shows the comparisons by vol- 
umes. (Data for 1925 and 1930 from Fifteenth Censm of the United States, 1930, Agr’ 
culture Volume II, Part I, p. 55. Data for 1936-1938 are from Farm Implement News 
April 7, 1938.) 
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Bar Charts 

The bar chart shown, in section A of Chart 54 is a simplified form using 
no scale. In Chart 55 the same data are shown by means of a bar chart 
which has a scale and which also varies the spacing between the bars in 
order to call attention to the fact that the time intervals vary. When the 

THOUSANDS 
OF TRACTORS 


Chart 55. Number of Tractors on Farms in the United States, 1925, 1930, and 1936- 
1938. (Data from same sources as Chart 54 ) 

chart is expected merely to convey a very general impression, simple bar 
charts may be drawn without the use of a scale as in section A of Chart 54. 
However, the scale should not be omitted when two or more bar charts are 
shown depicting different magnitudes. Consider Chart 56, which shows 




•27 '29 *30 ’31 *32 *33 *34 *35 *36 *37 *27 *28 '29 *30 *31 *32 *33 *34 *35 *36 *37 


Chart 56. Assets and Surplus of an Insurance Company, 1927-1937, as Shown in 
an Advertisement. Because of the absence of vertical scales and the proximity of the 
two sets of bars, the chart gives the visual impression that surplus is nearly as great 
as assets. 
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the assets and surplus of an insurance company as set forth in an adver- 
tisement. The chart shows that both assets and surplus have grown, but 
it also gives the impression that surplus, m the later years, was almost as 
great as assets. This, of course, is incorrect. Each part should have had 
a vertical scale, which would have shown 1937 assets to have been slightly 
over $200,000,000 and 1937 surplus to have been about $12,000,000. 

All the bar charts that have been shown are representations of chrono- 
logical data and, following the customary procedure, the bars have been 
arranged veitically. Vertical bars should also be used for data classified 
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Chart 57. Value of United States Imports from Each Cantinent, 1936. (Data from 
Siahstical Abstract of the United States, 1937, p. 448 ) 

^ Africa $51,000,000; Oceania $36,100,000. 

quantitatively, as in Chart 61. When making comparisons of data classi- 
fied qualitatively or geographically, on the other hand, horizontal bars 
are generally used. Chart 57 shows such a comparison of United States 
imports from each continent. There are no set rules to be observed in 
drawing bar charts. Certain considerations, however, are helpful. 

(1) Individual bars should be neither exceedingly short and wide nor 
very long and narrow. 

(2) Bars should be separated by spaces which are not less than about | 
the width of a bar or greater than about the width of a bar. 

(3) A scale is generally useful. It should be not more than i the width 
of a bar from the top bar (or from the left-hand bar if the bars are vertical). 

(4) Guide lines are an aid in reading the chart- Sometimes the chart 
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Chart 58. An Application of the Bar Chart. (From United States Bureau of Labor 
Statistics, WageSj Hours, and Working Conditions in the Bread-Baking Industry, 1934y 
Bulletin No. 623, p. 75 ) 

is enclosed and the guide lines are extended through the entire chart as 
in Chart 57; sometimes the chart is not enclosed and the guide lines are 
cut off as in Chart 60. 

When showing a time series graphically, we may use either a bar chart 
©r a curve. If the series covers many years, it is generally not desirable 

MILLIONS 
OF PERSONS 



Chart 59, Native-Born and Foreign-Bom Population of the United States, 1870-1930. 
The relative growth of the two series is not apparent from this type of chart, but may 
be shown by means of a semi-loganthmic chart as descnbed in the preceding chapter. 
(Data from the Statuticcd Abstract of ike United States, 1937, p. 11.) 
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to use a bar chart, which is laborious to construct. A curve facilitates a 
study of the general change which has taken place in a series, whereas a 
bar chart enables comparisons of specific years to be made more readily* 
Chart 58 shows an interesting application of the principle of the bar 
chart. It shows for each of 93 bakeries the proportion of day and night 
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Chart 60. Acreage Harvested in the United States of Com, Wheat, Oats, Barley, 
and Rye, 1936 and 1937. (Data from Agricultural Statisticsj 1988, pp. 10, 33, 43, 57, 
and 67. Data for 1937 are preliminary.) 

operation in 1934. The advantage of this chart is that it shows the in- 
formation for each of the 93 concerns in a more compact form than could 
well be done otherwise. 

Sometimes we wish to compare two sets of data over a period of several 
years. This may be done by means of a two-unit bar chart, as shown in 
Chart 59. Similarly, we may wish to compare several categories fox 
two years; such a comparison is shown in Chart 60. We may also use the 
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Chart 61. Employable Persons in Philadelphia, by Age Group and Sex, May 1936. 

(Data from Gladys L Palmer, Recent Trends in Employment mid Unemployment in 
Philadelphia^ pp. 50, 55, Works Progress Administration, National Research Project 
m cooperation with Industrial Research Department, IFniversity of Pennsylvania, 
Report No. P-1, December 1937 ) 
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Chart 62. Per Cent Change in Employment in Eight Divisions of the Textile In^ 
dustry, September-October, 1938. (Based on data from Monthly Labor Review j Janu' 
ary 19S9» n- 237.) 
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two-unit bar chart to compare several categories each of which is subdi- 
vided into two parts, as in Chart 61, 

A two-direction bar chart, such as Chart 62, may be used to show in- 
creases and decreases. Such a chart is even more effective if increases 
can be shown in black and decreases in red. Increases and decreases in 
a series of data for a number of years may be shown by means of vertical 
bars above and below a horizontal zero line, 

<n<n 


<n<n<n<n<n 

cn<n<n<ncn<#' 

<n<n<n<n<n<n 

Each symbol represents 250^000 tractors wqorial statistics, inc 

Chart 63. Tractors on Farms in the United States, 1925, 1930, and 1936-1938, as 
Shown by a Pictograph. (Data from same sources as Chart 54.) 

Pictorial Devices 

In section D of Chart 54 the number of tractors on farms at each of cer- 
tain years was represented by means of pictures of tractors of varying size. 
While this sort of chart does not convey a satisfactory comparison to a 
reader, it does attract attention. The pictorial value may be retained by 
using a number of small pictures, all of the same size, and arranging them 
so as to form a bar chart. Such a graph is often referred to as a 'pictograph. 
Chart 63 shows a comparison of tractors on farms by means of this device. 
While the diagram is essentially a bar chart, it is more attractive and thus is 
more likely to be examined by a reader. No scale is used, but since the 
pictures are all of the same size and since each represents 250,000 tractors, 
approximate numerical values may be had from the chart, if they are 
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wanted. Although a bar chart of a time series generally uses vertical 
bars, it will be observed that the pictograph shown as Chart 63 has hori- 
zontal bars. Pictographs are often arranged in this way because it seems 

CAPACITY AND PRODUCTION 
OF ELECTRIC POWER PLANTS 

1909 1^ 

1919 

1929 

1 934 Igjgl igj|gl igj|gi igjjgl Igjgl lgj|l Igjgl igjgl Igjgl 

Each dynamo represents 5 million KW capacity 
Each bolt represents 10 billion KWH production 

Cliart 64. A Pictograph from Eudolph Modley, How to Use Pictorial Stahstics 
p. 35, Harper and Brothel'S, New Tor-k. 1937. 
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more suitable to have tractors, people, houses (or whatever is beinp, pic- 
tured) standing side by side rather than on top of one another. 

Chart 64, another example of a pictograph, shows a comparison of the 
capacity and production of electric power plants. Chart 65 represents a 
slightly different application of the idea, in that bars are actually used but 
the pictures are shown in white against the black background of the bars. 
It should be apparent that, in making a pictograph, the picture is so chosen 


INDEX NUMBERS OP AGGREGATE Employment, man-hours, 
AND Pay rolls in the folding -Paper- Box industry 
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Chart 65. A Modified Pictograph. (From United States Bureau of Labor Statistics.. 
WageSy Hoursy and Working Conditions m the FolMng-Faper^Box^ Industry, 1933, 1984, 
and 1985 y Bulletin No. 620, p 51.) 

as to suggest the nature of the data being shown. Certain basic rules for 
the use of pictorial devices are shown in Chart 66, 

Component Part Charts 

The parts of a total may be shown by means of a bar as in Chart 67 or 
by a pie diagram as in Chart 68. The bar chart involves a one-dimensional 
comparison of the lengths of the sections of the bar; whereas the pie dia- 
gram involves a two-dimensional comparison of the pie sections, or a one- 
dimensional comparison of the arcs of the pie sections, or a comparison of 
the central angles. Accuracy of judgment is about the same wiiether 
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based on a bar chart or a pie diagram,^ with the exception that 25 per cent 
(shown by a right angle) and 50 per cent (shown by a diameter) sections 
are more accurately gauged from a pie diagram. The pictorial value of 

BASIC RULES 
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I SYMBOLS SHOULD Bt SELF-EXPLANATORY 
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Chart 66. Basic Rules for Drawing Pictographs as Suggested by Rudolph Modley. 
^.From Rudolph Modley, How to Use Pictorial Stahshcsj p. 15, Harper and Brothers, 
tsTew York, 1937.) 

^ See ^^Bar Charts Versus Circle Diagrams,^' by Fredenck E. Croxton and Roy E. 
Stryker. Journal of the American Statistical Association^ December 1927, pp. 473-482. 
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the pie diagram is perhaps greater than that of the bar chart, and it is itv* 
creased when the pie diagram is designed to suggest a silver dollar. Chart 
69 shows an interesting use made of the pie diagram, which in this case 
represents 50 cents since that is the fare 
charged on certain tunnels and bridges op- 
erated by the Port of New York Authority. 

A single component-part bar is occasionally 
drawn without a scale and is sometimes 
horizontal. One advantage of the vertical 
bar over either the horizontal bar or the pie 
diagram is that the sections are easier to 
label (see Chart 67). 

Several suppliers of graph paper offer 
sheets showing a circle with the circumfer- 
ence graduated from 0 to 100, thus enabling 
us to construct pie diagrams more readily. 

If such sheets are not available or if vary- 
ing sizes of circles are desired, pie diagrams 
may be made by the use of compasses and 
a protractor. Since the conventional pro- 
tractor divides a circle into 360 parts or 
degrees, the percentages which are to be 
shown should be multiplied by 3.6. Divid- Chart^ 67. I^oportion of 

mg a circle into percentages is facilitated j^'Each Specified Age Group, 
by the use of a protractor^ calibrated to 1930 . {D&ta horn Fifteenth Cen- 

divide a circle into 100 parts, as shown in Statesj iW, 

1 1 1 1 Population Volume II, p S7o.) 

Chart 70; such a scale may be engraved or ^ 

otherwise marked on the back of an ordinary protractor (page 137). 

Chart 71 shows how bar charts may be used to compare several sets of 
component parts and also how the same comparisons may be made by 
means of pie diagrams. It seems clear that comparisons between the years 
are made more easily from the bars than from the circles. The guide lines 
running from section to section assist in making comparisons from the bar 
chart: when the lines are parallel there has been no change; when they 
diverge, there has been an increase; when they converge, a decrease has 
occurred. 

The comparison of component parts in Chart 71 is on a relative basis; 
the proportion of each age group in the population is shown. When we 
indicate how many of each age group were enumerated, we have diagrams 

3 See '"A Percentage Protractor,” by Frederick E. Croxton, Journal of the American 
Statistical Association, March 192^ pp. lOS-lOn 
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Chart 68. Proportion of the Population of the United States in Each Specified Age 
Group, 1930. (Data from same source as Chart 67.) 

such as are shown in Chart 72. The bars and circles vary in size because 
the total population has increased. In this instance the bar chart is clearly 
preferable to the pie diagram. When data such as those shown in Charts 
71 and 72 cover a number of years, it is generally preferable to make use of 
curves as was done in Charts 30 and 31. While the bar charts of Charts 



Chart 69. A Pie Diagram Used by the Port of Kew York Authority. This was a 
4-page folder about 3i inches m diameter. Pictured are pages 1 and 3, Page 1 was 
printed in silver and black. Page 2 showed Port Authority name and adless and 
the statement “The 50^ toil— where it went, bridges and tunnel, year 1935. Page 4 
gave a brief statement of the expenditure for each of the seven items going to make up 
“operation and mamtenance.^' 
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71 and 72 present chronological data, we may also compare component 
parts for different places or categories. For example, we might compare 
the proportions of males and females in the urban population with the pro- 
portions of males and females in the rural population. One bar, subdivided 
for males and females, would represent the urban population; the other 
bar, similarly divided for the sexes, would represent the rural population. 

Statistical Maps 

Statistical maps are graphic devices which show quantitative informa- 
tion on a geographical basis. We shall consider hatched or shaded maps, 
dot maps, and pin maps. 

Hatched maps. Hatched or shaded maps undertake to sho^v for each 
geographical area under consideration the magnitude of the phenomenon 
which is being studied. The 
variations in magnitude are 
represented graphically by 
progressive differences in 
hatching or shading. In Chart 
73 the various hatchings indi- 
cate the crop conditions in 
the drought area of the United 
States during the period 1930- 
1936. The counties having 
relatively poorest crops are 
shown in solid black, and the hatcliing becomes progressively lighter so 
that the lightest indicates the counties which had relatively the best crops. 
Since the drought area did not follow state lines, the parts of six states 
which were not considered as in the drought area are shown in white. The 
outstanding characteristic of maps such as this is that a progressive 
change in the hatching or shading indicates an increase (or occasionally a 
decrease) in the phenomenon being measured. 

Chart 74 shows a shaded map. On this map the states limiting truck 
drivers to 7 or 8 hours at the wheel are indicated by white; progressively 
darker shaded areas show the states which permit longer driving periods; 
solid black indicates no limit. 

Sometimes statistical maps are made in colors. However, the principle 
of progressive shading caimot be developed satisfactorily by using different 
colors. It is possible, of course, to use progressive shades of a single color 
and thus sometimes to produce a more attractive map than could be done 
by using black and white. 

Dot maps. The preceding ^statistical maps were used to show averages 
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or ratios — average yield per acre and hours per day. When, however, 
the object is to show the geographical distribution of occurrences, the dot 
map should be used. Chart 75 shows one of the simplest of dot maps. 
Each dot represents a service station, and the concentration of them in 
various parts of the country is clearly shown. In order to avoid heavy 
concentrations of dots at large centers of population, maps such as this are 
sometimes made on a county basis so that all the cases in a county are 
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Chart 71. Proportion of the Population of the United States in Each Specified Ag€ 
Group, 1850, 1870, 1890, 1910, and 1930. If the pie diagrams were shown separately, 
a legend would be necessary to identify the various hatchings. (Data from same source 
as Chart 67.) 


more or less evenly distributed throughout the county. On the othei 
hand, when the dots are located at the exact place of occurrence, the heavy- 
concentrations of dots, which often become black blotches, indicate clearh 
the concentrations of occurrences in those areas. Chart 76, another dot 
map, shows the increase in the number of farms in the United States from 
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1930 to 1935. Here the concentrations are clearly evident and show up in 
several places as areas of solid black. 

In drawing a dot map, the number of units represented by one dot may 
be large, so that the number of dots in a region is small enough to be 
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Chart 72. Population of the TTuited States in Each Specified Age Group, 1850, 1870, 
1890, 1910, and 1930. If the pie diagrams were shown separately, a legend would be 
necessary to identify the various hatchings. (Data from same source as Chart 67.) 


counted, or the number of units represented by one dot may be small, so 
that the numerous dots give the effect of a gradual change in intensity of 
shading from light to dark. Which technique to use depends on the 
purpose of the chart. 

A different sort of dot map is shown in Chart 77, which uses dots of 
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Cbajt 73. Crop Conditions in the Drought Area, Average crops for 1930-1936 ex 
pressed as per cent of normal The white areas were not considered as part of the 
drought area. (From Works Progress Administration, Division of Social Research, 
Amis of Interise Drought Distress, 1980-1936, p. 15.) 



Chart 74, Legal Limit on Htunher of Driving Hours of Common-Carrier Truck 
Drivers, by States, 1937. (National Safety Council, How Long on the Highway, p. 2: 
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Chart 75. Auto-Lite Factories and Service Stations in the United States, 1936. (Re- 
produced from an advertisement of the Electric Auto-Lite Company ) 

varying size. In this study 4,030 truck drivers were stopped at various 
places and were asked how long they had been driving and certain other 
correlative questions. The areas of the circles indicate the relative number 
of drivers questioned at each point. While the varying circle sizes indicate 
clearly that more drivers were quizzed at certain places than at others, it 
is not easy to make accurate comparisons from these dots. We cannot 
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Chart 76. Increase in Kumber of Farms in the United States, 1930-1935. {From 
thft United States Department of Agriculture, Bureau of Agricultural Economics.) 






Chart 77. Number of Drivers Interviewed and Location of Interview in a Study of 
Driving Practices of Truckers, 1936. (Reproduced from National Safety Council, Hovy 
Long on the Highway, p 19 Note that five of the states are not identified.) 



Chart 7S. A Portion of an Automobile Accident Map of the City of Watertown 
Hew York. The full effect of this map not apparent in a black and white photograph. 
The pins used were as follows: black, fatal accidemj yellow, pedestrian injured; light 
blue, automobile and bicycle; red, collision of two or more cars; crystal, light pole 
broken by automobile. (From National Safety Council Chicago, Illinois.) 




Chart 79. A Portion of a Traffic Flow and AntomobUe Accident Map of Elgin, Ulinois, 1935. The width of the black lines indicates 
the number of vehicles during the peak hour period Classes range from 50 to 900 cars per hour A large mckel pm denotes a fatal accident, 

a red pm (black m the photograph) indicates a personal injury, a white pin means property damage of less than SIO, and a yellow pm desig- 
nates property damage of $10 or more. White and yellow pins both appear white in the photograph. (Photograph of map furnished bv the 
National Safety Coimcil. Chicago^ Illinois.) 
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compare diameters directly. We must remember that, if one circle has a 
diameter twice as great as another, then the first circle has an area four 
times that of the second. 

Pin maps. Pin maps may be thought of as a particularly flexible sort 
of dot map. They consist of maps mounted on backing of cork, cardboard, 
wallboard, corrugated cardboard, etc., on which information is recorded 
by means of pins having (usually) glass heads of different size, color, and 
shape. The available pins range in size from tiiose having heads about 
inch to about f inch in diameter. A large number of colors is available 



States at Each Census, 1890-1930. (Reproduced from National Resources Committee, 
Problems of a Changing Populahony p. 61.) 

as well as round-, square-, and triangular-head pins. Pin maps may be 
readily altered as the facts change. Because of this flexibility and the 
wide variety of pins available, the pin map is a very useful means of pre- 
senting geograpMcal data. An extensive pin map scheme, involving one ot 
more maps mounted on cork and hundreds or thousands of pins, is ex- 
pensive but may often prove very useful. 

Charts 78 and 79 show two examples of pin maps used to record auto- 
mobile accidents. By using one or more such maps, it is possible not only 
to observe the frequency with which accidents occur at various places; 
but also the nature of each accident (automobile hitting pedestrian, auto- 
mobile hitting automobile, automobile hitting fixed object, etc.) and the 
result of the accident (property damage, occupant injured, occupant 
killed, pedestrian injured, pedestrian killed, etc-)# 
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One difficulty with the statistical map is that the importance of different 
regions is not to be judged by their areas. For instance, a hatched map 
showing income per family in different states would be somewhat mis- 
leading because there are many more families in some of the states occupy- 
ing a very small area than there are in other states occupying a very large 
area. An interesting device for overcoming this difficulty is to draw the 
map in such a way that the area of each state is in proportion to its popula- 
tion. Chart 230, page 741, shows such a map. 

Occasionally a map and some other tj^^e of chart are used in combina- 
tion. Chart 80 shows such a usage. The data to be presented consisted 
of the percentage of farms operated by tenants at each of five censuses and 
for each of six parts of the United States as well as for the country as a 
whole. With the seven bar charts placed on the map, the reader may 
visualize exactly what territory is referred to in each instance. 
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CHAPTER VII 

RATIOS AND PERCENTAGES 


It was pointed out in the chapter dealing with statistical tables that 
derived figures are useful to assist in summarizing and comparing data. 
In that chapter specific mention was made of ratios, percentages, and 
averages. This chapter will discuss ratios and percentages. Averages 
and related measures will be examined in later chapters. 

To express the ratio which 753 bears to 251, we divide 753 by 251, 
which gives 3, and we say that 753 is to 251 as 3 is to 1, or more briefly 
753 : 251 : : 3 : 1. We have thus indicated the relationship which the 
first of these two numbers bears to the second as a rotio to one. If it suited 
our purpose better, we could express the relationship as a ratio to any other 
number. For example, we could use a ratio to ten, saying 753 : 251 : : 
30 : 10; we could use a ratio to one hundred and write 753 : 251 : : 300 ; 
100. This last ratio, per hundred, is generally referred to as a percentage, 
and we note that 753 is 300 per cent (from per centum) of 251. It will 
thus be seen that percentages, which are used so frequently, are merely 
special cases of the more general concept of ratios. If, instead of using a 
ratio per hundred, we find occasion for a ratio per thousand we may refer 
to our figures as ^^per mille.” 

Ratios are computed in order to expedite comparisons. Not only are 
large numbers reduced as in Table 3, but much is gained by comparing a 
series of figures with a rounded base of 100 (which can be carried in one^s 
mind) rather than by attempting to compare each individual economic 
class of exports with total exports and each economic class of imports with 
total imports. Relative change may be visualized more concretely when 
shown by percentages as in Table 20, or when shown by one of the methods 
in Table 21. 

Calculation 

When one or more numbers are being compared to another number, the 
figure to which comparisons are made is known as the hase, A ratio is 
found by dividing the figure, which is being compared to the base, by the 

146 
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TABLE 20 

Pkodugtion and Yield pek Acre op Selected Grains in the United States 

1936 AND 1937 


Gram 

Production 
(thousands of bushels) 

Pei cent 
increase 

in pro- 
duction 

1 ! 

Yield per acre i 

(bushels) 1 

1 

Per cent 
increase* 
in yield 
per acre 

1936 

1937 

1936 

1937 

Corn . . 

1,507,089 

2,644,995 

1 

(o 5 

16 2 

28.2 

74.1 

Oats 

785,506 

1,146,258 

45 9 

23 5 

32 7 

39.1 

Wheat 

626,766 i 

873,993 

39 4 

12 8 

13.6 

62 

Barley 

147,475 

219,635 

48 9 

17 6 

22.1 

25.6 

Gram sorghums 

55,079 

97,097 

76,3 

8.0 1 

13.2 

65 0 

Rice 

49,002 

53,004 

8 2 ' 

50 6 

48.5 

-4.2 

Rye 

25,319 

49,449 

95 3 i 

9 1 j 

12.9 1 

41 8 

Buckwheat 

6,285 

6,777 

78 ! 

! 

16.8 1 

15.9 

1 

-5.4 


* A minus sign denotes a decrease 

Source Crops and Markets, Vol 14, No 12, December 1037, pp 2oO~2Gl 


base. (The use of calculating machines is discussed in Appendix C.) 
The figure is then expressed in terms of or in relation to the base, and ratios 
of all sorts are therefore sometimes referred to as relahve numbers or relor^ 
Uves. 


TABLE 21 


Production of Potatoes in the United States, 1926-1937 


Year 

Production 
(thousands 
of bushels) 

Per cent 
of 1926 

Per cent 
increase 
over 1926 

Per cent 
of preceding 
year 

Per cent 
increase* 
over pre- 
! cedmgyear 

1926 

321,607 

100 0 



I 

1927 

369,644 

114 9 

14 9 

il4.9 

14,9 

1928 

427,249 

132.8 

32 8 

115.6 

15 6 

1929 

332,204 

103 3 

3.3 

77.8 

-22.2 

1930 

340,572 

105.9 

59 

102.5 

2.5 

1931 

384,125 

119 4 

19.4 

112.8 

12.8 

1932 

376,425 

117 0 

17.0 

98.0 

- 2.0 

1933 i 

342,306 ! 

G0G.4 

64 

90.9 

- 91 

1934 

406,105 

126.3 

26.3 

118.6 

18.6 

1935 

386,380 

120 i 

20 1 

95.1 

- 4.9 

1936 

331,918 

103 2 

3.2 

85.9 

-141 

1937 

391,159 

121.6 

216 

117.8 

17.8 


^ A minus sign dcui^tO" a decrease 

Source Crop-, ana Ma^] ‘t , Tol 14, No. 12, December 1937, p 261, 


The amount of money in circulation in the United States on June 30^ 
1914, was $3,469,434,174. On June 30, 1938, the circulating medium to- 
taled $6,461,058,390. To state the 1938 circulation in terms of the 1914 
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circulation (the base), we divide $6,461,058,390 by $3,459,434,174 and 
obtain 1 868. This figure means that the circulation in 1938 was 1.868 
times as great in that year as in 1914. In many instances ratios are most 
useful when giv^en as percentages. To change 1.868, the ratio to one, to a 
ratio per hundred the decimal point is moved two places to the right; the 
resulting figure, 186.8 indicates that money in circulation in 1938 amounted 
to 186.8 per cent of that in 1914. 

It should be noticed that there are two ways in which we can express the 
percentage figure just arrived at. Instead of saying that the 1938 circu- 
lation was 186.8 per cent of 1914 circulation, we may state that circulation 
in 1938 was 86.8 per cent greater than in 1914. In the first instance we 
compared the totals for the two years; in the second we compared the 
change which took place with the 1914 total. ^ 

Effect of Changing Base 

Naturally a different set of percentages would be arrived at if we com- 
pared the 1914 circulation figures to the 1938 figures. We are now using 
1938 as the base, and the 1914 figure is divided by that for 1938. Per- 
forming this operation indicates that circulation in 1914 was 53.5 per cent 
of that in 1938, or that circulation in 1914 was 46.5 per cent less than that 
in 1938. Observe that, while the 1938 figure was 86.8 per cent greater 
than the 1914 figure (1914 was the base), the 1914 figure was 46.5 per cent 
less than the 1938 figure (1938 was the base). This difference is, of course, 
due to the fact that the basis of comparison was first in reference to 1914, 
then to 1938. If a number is increased 100 per cent, the second number 
need be decreased but 50 per cent to arrive at the original figure. Con- 
versely, if a given number is decreased 50 per cent, the second number 
must be increased 100 per cent to reproduce the given number. 

The failure to realize the effect of this change of base may lead to the 
drawing of false conclusions. Some years ago a firm decreased the wages 
of its employees 15 per cent; later it increased the reduced wages 5 per 
cent; then it raised these increased figures 5 per cent; and finally it increased 
these second figures another 5 per cent. Afterwards it announced that 
the three 5 per cent increases put wages back where they were before the 
15 per cent reduction. Calculation will show that the new wages were 
really 98.4 per cent of the original wages before reduction. If the com- 


^ Suppose we are compariug two percentages, as 4.0 per cent and 9.0 per cent. We 
may speak in absolute terms and say that 9.0 per cent is 5.0 per cent more than 4.0 
per cent. We may speak in relative terms and say that 9.0 per cent is 125 per cent 
greater than 4.0 per cent, or that 9.0 per cent is 225 per cent of 4.0 per cent. When 
comparing percentages, it is advisable to make quite clear whether we are speaking in 
absolute or relative terms. 
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pany had given a single 15 per cent increase of the reduced wages, the 
new wages would have been but 97.75 per cent of the original wages. 

Table 22 shows for selected percentages of increase the per cent which 
the new number must be decreased to reproduce the original number. It 
should be borne in mind that a per cent of increase figure may be indefi- 
nitely large; however, a per cent of decrease figure of 100 indicates a de- 
cline to zero, while a per cent of decrease of over 100 indicates a fall to a 
negative quantity. 

Recording Percentages 

Generally percentages are recorded to one decimal place. If the per- 
centages are based upon large figures and particularly if one part of a total 
is quite small (see Tables 7 and 24), it may be desirable to use more than 
one decimal. Occasionally only whole percentages are shown and enable 
relationships to be grasped readily Whole percentages will not suflSce, 
however, when the relative variations are extremely small. 

TABLE 22 

Illusteations op Effect op Shifting Base in Calculating Percentages 


Given number 

Per cent of 
increase 

New number 

Per cent new number 
must be decreased to 

1 yield given number 

10 

500.00 

60 00 

S3 33 

10 

200 00 

30 00 

( 66 67 

10 

100 00 

20 00 

50 00 

10 

50.00 

15.00 

33.33 

10 

33 33 

13 33 

25.00 

10 

25 00 

12 50 

20 00 

10 

10 00 

11.00 

9 00 

10 

5 00 

10.50 

4 76 

10 

1 00 

10 10 

.99 


Percentages should not be calculated if the absolute numbers are small, 
especially if the base is appreciably less than 100. A serious difiSculty 
arising out of the use of percentages based on small absolute numbers is 
discussed on pages 160-161. 

When percentages are to be recorded with one decimal, they are correct 
to the nearest tenth of one per cent. The following examples will indicate 
the procedure in rounding percentages (and also in rounding other calcula- 
tions involving remainders) ; 

(1) $371.16 -r- $679.28 = .5464, or 54.64 per cent. The second decimal 
is less than 5 and therefore this percentage, to the nearest tenth of one per 
cent, is 54.6. 
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(2) 2,319 pounds ^ 7,532 pounds = .3079, or 30.79 per cent. In this 
instance the second decimal is more than 5 and the percentage should be 
recorded as 30.8. 

(3) 280,511 feet ~ 11,000,000 feet = .025501, or 2.5501 per cent. Here 
the second decimal is 5, but there is a remainder which results in the 1 in 
the fourth decimal place. Recorded to the nearest tenth of one per cent 
this figure is 2.6. 

(4) 1,341 barrels -f- 6,000 barrels = 2235, or 22.35 per cent. Here the 
nearest tenth is either 22.3 or 22.4. It does not greatly matter whether 
occasional results such as this are raised in the first decimal place or 
whether the second decimal is dropped. However, it is better to follow 
some consistent scheme. Particularly when many ratios are being calcu- 
lated which are eventually to be added, it is well to employ a method which 
will cause half of the ratios with a second decimal of exactly 5 to be raised 
and half to be lowered. This practice will avoid the accumulation of 
errors. Probably the most satisfactory scheme is to raise the first decimal 
when the first decimal is an odd number (67.35 becomes 67.4) and to drop 
the second decimal when the first decimal is an even number (67.65 becomes 
67 6). 

Reference to the percentage data shown in Table 4 will reveal that the 
nine percentages add to 100.1 rather than to 100.0. This is the consequence 
of rounding all percentages to one decimal place, which fairly often results 
in totals of 99.9 or 100.1 and occasionally shows 99.8 or 100,2. Some statis- 
ticians adjust one of the percentages in order to produce the correct total, 
but it seems preferable to let each percentage stand correctly rounded, sr 
in Table 4. It is interesting to note that, if the individual percentages are 
carried to one more decimal place than is the total, this apparent discrep- 
ancy does not occur. 

Types of Comparisons 

We have already seen instances in which the parts of a whole are com- 
pared to the total in Tables 3, 4, and 7. Here the percentages were ob- 
tained by dividing each item in turn by the total. More expeditiously 
we may take the reciprocal of the total and multiply the reciprocal by each 
of the component figures. This is a time-saving device adapted particu- 
larly to the calculating machine, and is applicable whenever we are dividing 
a series of numbers by a constant number. 

An example in which one part of a total is compared with another part 
of another total is given in Table 8. In this table each figure for males 
was divided by the appropriate figure for females, since the sex ratio con- 
sists in stating the number of males per 100 females. 

Table 21 indicates a number of different comparisons which may be made 
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in regard to data arranged chronologically. In column 3 the production 
of potatoes for each year is compared with the 1926 production; each figure 
IS divided by that for 1926. Column 4 shows the percentage by which 
the production for each year exceeded that for 1926; each yearns nu- 
merical increase or decrease over 1926 is divided by the 1926 production. 
In column 5 the production each year is related to that of the preceding 
year; each year's figure is divided by that for the preceding year. Column 
6 indicates the per cent of increase or decrease of each year in relation to 
the preceding year; the numerical increase (or decrease) of each year over 
the preceding year is divided by the production for the preceding year. 
In columns 3 and 4, comparisons are made with a fixed base, 1926. In 
columns 5 and 6 the base is constantly shifting, being always the preceding 
year. 

Another application of percentages is shown in Table 20. Here the 
1936 figure for each crop is the base The percentage columns headed 
^^per cent increase" indicate the relative increase or decrease in each crop 
from 1936 to 1937. 


Some Frequently Used Ratios 

The following paragraphs indicate a few interesting applications of ratios 
and percentages. The reader will doubtless become aware of many others 
as he reads more or less technical material in magazines, newspapers, 
books, and advertisements. 

Index numbers. Most index numbers are presented in the form of per- 
centages.^ In the construction of an index number of wholesale prices, 
for example, the commodities to be included are selected first, and their 
prices are then combined with due regard to the varying importance of 
the different commodities. If the index number is a chronological one, as 
is usually the case, some year may be designated as the base and prices in 
that year are set equal to 100. The prices for the other years are then ex- 
pressed in relation to that base year. The United States Bureau of Labor 
Statistics uses 1926 as the base year for its index numbers of 813 wholesale 
prices. Wholesale prices in 1926 are therefore represented by 100. The 
index number for 1928 was 96.7 ; for 1929 it was 95.3; it fell to 64.8 in 1932, 
rose to 86.3 in 1937, and dropped to 78.6 in 1938. Prices in each of these 
later years are thus expressed in terms of 1926, which is regarded as a repre- 
sentative or “normal" year. 

Sex ratio. The relationship of the number of males to the number oi 
females in the population is given by the sex ratio, which states the number 
of males per 100 females. In 1930 there were 62,137,080 males and 


2 See Chapters XX and XXI for a more complete discussion of index numbers. 
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60,637,966 females in the United States. There were thus 102.5 males 
per 100 females in the United States, as shown in Table 8. This ratio 
varied in the different states. It was lowest in Rhode Island, where there 
were 95.2 males per 100 females, and highest in Nevada, where there were 
140.3 males per 100 females. The various nativity groups in the popula- 
tion are listed in Table 8 Negroes showed 97.0 males per 100 females, 
native whites 101.1 males per 100 females, foreign-born whites 115.1 males 
per 100 females, Japanese 143.3 males per 100 females, Chinese 394.7 
males per 100 females, and Filipinos 1437.7 males per 100 females. 

Population density. Instead of merely comparing the total population 
of two communities, it may often be more meaningful to consider the 
density of the population. We do this by dividing the total population 
by the area in square miles, and thus determine the munber of persons per 
square mile. For example, in 1930 the population of Montana was 537,606 
and the population of New Hampshire was 465,293. If we relate these 
figures to the area of each state, we find that New Hampshire had 26.7 
persons per square mile, while Montana had but 3.7 persons per square 
mile. These figures do not, of course, mean that there were 26 or 27 
persons on every square mile in New Hampshire and 3 or 4 persons on 
every square mile in Montana. They are merely summary figures indi- 
cating that, on the average, there were the indicated number of persons 
per square mile in the state. 

Population density may also be used in making chronological compari- 
sons. As our country has grown older, the population density has in- 
creased. In 1800 there were 6.1 persons per square mile in the United 
States; in 1930 there were 41.3 persons per square mile. 

Persons per family. With a decline in birth rates there is an accompany- 
ing decrease in the size of families. Thus in 1920 there were 24,351,676 
families in the United States and a total population of 105,710,620. The 
average number of persons per family was thus 4.34 in 1920. At the 
following census there were 29,979,841 families and a total population of 
122,775,046. The average family in 1930 was thus composed of 4.10 per- 
sons. The term “family^^ as used here included 75,178 quasi-family 
groups (institutions, hotels, etc.) in 1930. Quasi-family groups were in- 
cluded but not separately counted in 1920. 

Ratios per capita. Many figures are more meaningful or more useful 
when expressed on a per capita basis. The costs of government in the 
various states reflect not only the level of expenditure and government 
services but also the population of the states. For example, the cost of 
operation and maintenance of the general departments of the State of 
New York in 1937 amounted to $336,965,861, while in New Jersey it totaled 
$85,196,172* If these figures are each divided by the population of the 
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respective state, it appears that the cost was $25.95 per capita in New York 
and $19.63 per capita in New Jersey. 

The consumption of various commodities is frequently stated on a 
per capita basis. Thus in the period July 1936-June 1937 the '‘apparent 
consumption^' of oleomargarine (amount withdrawn for consumption) 
was 3.0 pounds per capita; the estimated consumption of rice was 6.0 
pounds per capita, the approximate amount of refined sugar consumed 
(available for consumption) was 97.4 pounds per capita. The apparent 
consumption of beef, veal, mutton, lamb, and pork was 126.8 pounds per 
capita during the calendar year 1936. 

The chronological comparison of figures is also frequently facilitated by 
relating them to the population. On June 30, 1926, the amount of money 
in circulation was $4,885,266,000. By June 30, 1938, this figure had in- 
creased to $6,461,058,390. During this same period, however, the popu- 
lation had been increasing so that the money in circulation had to serve 
a larger group of people. Expressing the money in circulation in terms of 
the population, we find that the per capita money in circulation amounted 
to $41.71 in 1926 and $49.67 in 1938. 

Death rates. The crude, gross, or general death rate for a given year 
is obtained by dividing the number of deaths occurring in a community 
during that year by the mid-year population of that community, and 
expressing the result in terms of per thousand. In 1936 there were in the 
Umted States 1,479,228 deaths from all causes. The 1930 census, taken 
as of April 1, 1930, enumerated 122,775,046 persons; and the June 30, 
1936, population was estimated to be 128,429,000. The death rate for 
1936 was therefore 1,479,228 -r- 128,429,000 = .0115, or 11.5 per thousand. 
It will be seen that the accuracy of a death rate depends first upon the 
degree of completeness of the registrations of deaths, and second upon the 
accuracy of the mid-year population estimate used as a base. Since 
population counts are made only once in 10 years, most of the population 
figures used must be estimates. When the population is estimated for a 
year falling between two censuses, the estimate is termed an tnter-censal 
estimate; when the estimate is for a year after a census, it is termed a 
post-censal estimate. Inter-censal estimates are naturally somewhat more 
accurate than post-censal estimates. For the years 1931 to 1939 inclusive, 
death rates must at present be based upon post-censal estimates and are 
called preliminary rates. After the 1940 census results are available, inter- 
censal estimates may be made for the years 1931-1939, and the death rates 
may be recomputed upon the basis of these new population estimates. 
Such rates are called remsed rates. 

When the deaths occurring in a state or city are divided by the popula- 
tion of that community, the resulting crude death rate is subject to certain 
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corrections. For example, in any given year people may die in a commu- 
nity who are residents elsewhere and also some residents of any large 
community may die outside of that community. If the non-resident 
deaths are deducted from those which occurred in the community, the 
resulting rate is referred to as a local rate. If, in addition, the deaths of 
residents occurring outside of that community are added, the resulting 
rate is referred to as a resident rate. Failure to recognize these important 
differences may lead to drawing false conclusions. In February 1935 it 
was announced that the death rate for Queens, borough of New York City, 
was 6.5 per 1,000, for Bronx 7.8, for Brooklyn 9.3, for Richmond 13.5, and 
for Manhattan 16.3. The death rate for Queens was lower than for any 
other such community in the United States, and some persons promptly 
announced that Queens was '^the healthiest place in the country.’^ It 
was very quickly pointed out in the press, however, that Queens possessed 
a very low quota of hospitals and that, therefore, some residents of Queens 
in need of hospital care would seek it in Manhattan or elsewhere. Hospital 
cases naturally show a very high death rate, and a crude death rate would 
not reflect the fact that some persons dying in Manhattan and elsewhere 
were really residents of Queens. 

Death rates for particular groups of the population (males and females, 
various age groups, etc.) and for particular diseases or causes are referred 
to as specific death rates. Because the deaths from any one cause are 
relatively few, specific rates are usually stated per 100,000 of the popula- 
tion. Thus in 1936 the death rate for diphtheria was 2.4 per 100,000. 

An intelligent comparison of the death rates of different communities 
involves the necessity of adjusting for the fact that the proportions of the 
sexes may differ, for differences in the age distribution of the population, 
for variations in the racial and nativity composition of the inhabitants, for 
differences in occupations, and for other factors. A discussion of these 
differences and the methods of computing adjusted death rates is too 
specialized a topic to be treated in this text.^ 

Birth rates. Birth rates are usually calculated by dividing the births 
during a year by the mid-year population for that year. Just as in the 
case of death rates we may have preliminary rates and revised rates. We 
may also have gross, local, and resident rates. Stillbirths are not counted 
as births, although they have been so counted in the past; this fact should 

3 See George Chandler Whipple, Vital Statistics, Chs. VIII, X, and XII, second 
edition; John Wiley and Sons, Inc., New York, 1923. The adjustment for age distri- 
bution, for example, consists of .determining what the death rate would have been if 
the ages were those of the ^‘standard million” (based on the population of England and 
Wales in 1901). The death rate for each age group as observed in a commumty is 
applied to the number of persons in the corresponding age group of the standard million 
and the total of these computed deaths” is related to 1,000,000 to give the death ratt 
adjusted to the standard age distribution. 
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be remembered in making chronological comparisons. Perhaps it is also 
worth while calling attention to the fact that the registration of births is 
not so complete as is the registration of deaths. A death must be registered 
before a burial permit may be issued and before interment may be made. 
A newborn infant, ho wearer, may be absorbed into the family and the 
community whether or not his birth is registered. 

The calculation of birth rates in relation to the total population is not 
thoroughly satisfactory since the proportion of '^child producers’’ in the 
population is not constant either from time to time or from place to place. 
Refinements in the calculation of birth rates involve the separation of 
legitimate and illegitimate births, the comparison of legitimate births to 
the number of married women of child-bearing age, and the comparison 
of illegitimate births to the total population or to the number of unmarried 
women of child-bearing age.^ 

Crop yields per acre. Data of the total amount of a crop produced may 
tell US whether or not there is more of that commodity available in one 
year than in another. From such figures, however, we cannot know if 
an increase may have been due to a more abundant yield or to an increase 
in acreage. In 1933 there were 528,975,000 bushels of wheat harvested from 
47,910,000 acres in the United States; in the following year 42,235,000 
acres yielded 496,469,000 bushels. Although the total crop was smaller, 
the yield per acre had risen. In 1933 it was 11.0 bushels per acre, while 
in 1934 it was 11.8. The yield per acre was 12.2 bushels in 1935, 12.8 
bushels in 1936, and 13.6 bushels in 1937. On a geographical basis it is 
interesting to note that m 1937 the yield varied from 5.6 bushels per acre 
in South Dakota to 25.6 bushels per acre in Nevada. Our leading wheat 
states (Kansas and North Dakota, for example) did not show the greatest 
yield per acre. 

Hog-com ratio. The hog-corn ratio is the result of dividing the average 
price per 100 pounds which farmers receive for hogs by the average price 
per bushel which farmers receive for corn. For example, if, as in March 
1938, farmers are receiving $8.35 per 100 pounds for hogs and $.513 per 
bushel for corn, the ratio is $8.35 $.513 ~ 16.3. This ratio may be 

interpreted to mean that 100 pounds of hogs are 16.3 times as valuabk 
as a bushel of corn or, more simply, that 16.3 bushels of corn are equal in 
value to 100 pounds of hogs. In April 1938 hogs brought $7.77 per lOO 
pounds and corn yielded the farmer $.527 per bushel. At that time the 
ratio was 14.7. Over the 26-year period 1910-1935 the hog-corn ratio 
averaged about 11.1, falling as low as 6.0 in December 1934 and reaching 


* For a more complete discussion of birth rates, see IThipple, ihid.i pp. 246-251. The 
author also discusses marriage rates, divorce rates, morbidity rates, fatality ratios, and 
so forth. 
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18.7 in June 1926. When the ratio is low, it is more profitable for farmers 
to sell their corn outright than to feed the corn to hogs being fattened for 
market. When the ratio is high, it becomes more profitable for the 
farmer to feed corn to his hogs than to sell the corn outright. Since corn 
is the principal element of cost in producing hogs for market, the ratio is 
used as an indicator of the desirability of future expansion or contraction 
of hog production. There is thus a relationship between the hog-corn 
ratio and the hog production cycle. When the ratio is high, an increase m 
hog production tends to follow. Such an increase is frequently followed by 
a decline in hog prices in relation to corn prices, and there then follows a 
tendency to restrict hog production. Curves showing hog-corn ratios are 
shown in Charts 49 and 256. 


TABLE 23 

Individual Batting Averages op 22 Outstanding Americ\n League Players, 1937 


Player and club 

Games 

Times 
at bat 

Hits 

Batting 

average'^ 

Gehringer, Charles L , Detroit 

144 

564 

209 

371 

Gehrig, Henry L., New York 

157 

569 

200 

.351 

DiMaggio, Jos. P., Ji , New Yoik 

151 

621 

215 ; 

.346 

Bonura, Henry J , Chicago 

116 

447 

154 I 

.345 

Travis, Cecil H., Washington 

135 

526 

181 ! 

344 

Bell, Roy C , St Louis 

156 

642 

218 

.340 

Greenberg, Henry, Detroit 

154 

594 

200 

.337 

Walker, Gerald H , Detroit 

151 

635 

213 

.335 

Dickey, William N , New Y^ork . | 

140 

530 

176 

.332 

Fox, Ervin, Detroit 

148 

628 

208 

.331 

Stone, John T , Washington 

139 

542 

179 

330 

West, Samuel F., St Louis 

122 

457 

150 

.328 

Selkirk, George A , New Y"oik 

78 i 

256 


328 

Vosmik, Joseph F , St. Louis 

144 

594 

i 

1 193 

.325 

Radchff, Raymond A , Chica^^o 

144 

584 

190 

.325 

Sclters, Julius L, Cleveland 

152 

589 

190 

.323 

Moses, Wallace, Jr., Philadelphia 

154 

649 

208 

.320 

Henrich, Thomas D , New York 

67 

1 206 

66 

.320 

Appling, Lucius B , Chicago 

154 

574 

182 

317 

Allen, Ethan N., St. Louis . . . 

103 

320 

101 

.316 

Pytlak, Frank A , Cleveland 

125 

397 

125 

.315 

Lewis, John K., Jr , Washington 

156 

668 

210 

314 


* Th:'! column i=» headed '‘PC” m the oii/nnal tabic 

Official Baseball Guidt luSh, op 112-113, American Sports Publishing Company, 


Batting averages. The familiar batting average of the sport pages of 
the daily paper is a ratio of the hits made by a bafcter in relation to the total 
number of times he was at bat. Table 23 shows a series of selected batting 
averages. The figures in the last column of Table 23 may be correctly 
thought of as either ratios to one or as averages of a series of observations 
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each having a value of 1 or 0 (that is, either the batter did or did not make 
a hit) . If a man has been at bat 75 times and has made 25 hits, his batting 
average would be shown as .333 and is spoken of as ‘'three hundred and 
thirty-three.’^ If he had made a hit every time he vms at bat, his figure 
would be 1.000, which is referred to as "one thousand.” Notice that cer- 
tain contradictions are involved in some of the terms used to refer to these 
data. The column of figures is frequently headed "percentage”, the fig- 
ures are printed as ratios to one; the figures are spoken of as per thousand! 

Airplane accident ratios. The safety of air travel may be reflected by 
means of ratios. The number of miles flown during a year (or other 
convenient period) may be divided by the number of accidents to obtain 
"miles flown per accident.” In 1937 domestic air lines flew 66,071,507 
miles and 42 accidents occurred. The lines therefore flew 1,573,131 miles 
per accident. In the same year there were 5 accidents involving a fatality, 
and dividing the mileage flown by 5 gives 13,214,301 miles per fatal acci- 
dent. During 1937 there were 40 passenger fatalities as a result of air- 
plane accidents, and it appears that domestic air lines flew 1,651,788 miles 
per passenger fatality. Passenger fatalities may be related to passenger 
miles, and since domestic air lines flew 476,603,165 passenger miles in 1937 
we have 476,603,165 -s- 40 = 11,915,079 passenger miles flown per pas- 
senger fatality. Because of the small number of accidents and fatalities 
involved, these ratios fluctuate tremendously from year to year. For 
example, the passenger miles flown per passenger fatality were 3,500,607 
in 1930; 21,686,515 in 1933; 11,050,508 in 1934; 20,927,034 in 1935; and 
9,903,188 in 1936. It will be observed that, as air travel becomes safer, 
all of the ratios mentioned will grow larger. It would also be possible 
(though not customary) to compute the ratio of the number of accidents or 
fatalities per million miles flown. Such ratios would be reciprocals of those 
given and, as air travel becomes safer, would approach zero. 

The 100 per cent statement. When banks, insurance companies, and 
other corporations present financial information to the public, they fluid 
it effective to supplement the dollar figures with percentages. Thus a 
financial statement may show each asset as a percentage of all assets, and 
each liability as a percentage of all liabilities. The procedure is particu 
larly effective when the dollar figures are large. Table 24 shows the assets 
of the New York Life Insurance Company as set forth in an annual report. 
The actual figures are too large for the ordinary reader to grasp and com- 
pare, but the percentage data make comparisons less diflflcult. In pre- 
paring such a percentage statement it is desirable not to show too many 
decimal places, else comparisons cannot readily be made. A recent state- 
ment of the resources of a bank carried all percentages to three decimal 
places. This was quite unnecessary, particularly since the smallest item 
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^'sundry securities^' was .035 (.0349) per cent and could have been shown 
as .03 per cent, and since the second smallest item ^^other assets" was .039 
per cent and could have been shown as .04 per cent. For popular pre- 
sentation there is some advantage in lumping such small items together 
in order to center attention upon the more important ones. These two 
small items, if combined, would have appeared as .07 per cent or as .1 
per cent, and all percentages could have been shown to but one decimal 

TABLE 24 

Assets of the New York Life Insurance Company, December 31, 1937 


Asset 

Amount"* 

Per cent of total 

Cash on hand, or in bank 

$ 64,231,858.43 

2.55 

United States Government, direct, or fully 



guaranteed bonds . 

512,300,999.54 

20.33 

State, county and municipal bonds 

254,845,789 65 

10.11 

Railroad bonds . 

297,213,924.28 

11 79 

Public utility bonds 

229,437,611.57 

9.10 

Industrial and other bonds 1 

49,549,133.97 

1 97 

Canadian bonds 

59,771,724.10 

2.37 

Foreign bonds 

133,671.00 

01 

Preferred and guaranteed stocks 

81,644,201 00 

3 24 

Real estate owned (including home office) 

140,089,034 62 

5 56 

Foreclosed real estate subject to redemption 

2,265,334.31 

1 .09 

First mortgages on city properties 

405,082,891.33 

16.07 

First mortgages on farms 

6,936,336.77 

.27 

Pohey loans 

355,265,818.60 

14 09 

Interest and rents due and accrued 

30,149,211.77 

120 

Net amount of uncollected and deferred 


premiums 

31,358,413.78 

1.24 

Other assets . .... 

74,261 64 

.01 

Total admitted assets 

$2,520,350,216 36 

100.00 


XV Bonds eligible for aniort rat’on are earned at their amortized values determined m accordance with 
the laws of the state oi Xcv York All other bonds and all guaranteed and preferred stocks are carried 

at market r- — i-rd by the National Association of Insurance Commissioners Securities 

amountmg *■; < eluded above, are deposited as reQuired by law 

Source ‘ „ ‘ leport of the New York Life Insurance Company^ p 6 

place. However, it may have been desired to emphasize the smalInPHa of 
either “sundry securities” or “other assets,” or both. 

Railroad ratios. The efficient operation of railroads necessitates the 
collection and use of a vast amount of statistical data in connection with 
which numerous ratios are calculated. 

The investment per mile of line is obtained by dividing total investment 
(including roadway, tracks, equipment, stations, shops, etc.) by the num- 
ber of miles of railroad line. This figure was $105,922 per mile in 1936. 

Freight revenue per ton mile is obtained by dividing total freight revenue 
by the total number of ton miles of freight hauled. In 1937 the “revenue 
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per ton mile’^ for class I railroads was .935 cents. Similarly, we may 
compute the “revenue per passenger which amounted to 1.79 cents 

in 1937. 

The operating ratio is the ratio of operating expenses to operating reve- 
nues. In 1937 operating expenses were $3,119,064,323, while operating 
revenues were $4,166,068,600. The operating ratio was 74 87 per cent. 

There are a number of other railroad ratios; the meaning of each is 
rather obvious. Enumerating a few for class I railroads in 1937: the 
revenue per ton of freight was $1.85; the haul per ton of freight was 198.0 
miles: the revenue per passenger was $1.59; the haul per passenger was 
49 6 miles; the rate of return on aggregate property investment was 2.26 
per cent, the hours worked during the year per railroad employee were 
2,512, the percentage of unserviceable freight locomotives averaged 25 5 
during the year, while 10.1 per cent of the freight cars were in the same 
condition; the ton miles per day per freight car were 562; the mileage per 
day per freight car was 32 9 miles ^ 

The railroad ratios mentioned above are one type of business ratios. 
Many sorts of business organizations compute diverse ratios for the better 
functioning of the enterprise. Discussed in another volume® are such 
ratios as current ratio (current assets current liabilities), merchandise 
turnover (net sales -4- merchandise inventory), margin of profit (profit -r- 
sales), labor turnover (replacements number on payroll), and others. 

Faulty Use of Percentages 

Ratios and percentages are in such general use that it is not surprising 
to find them occasionally misused. Difficulties encountered in the calcu- 
lation and use of percentages can generally be traced to one of the following 
causes: (1) confusion in regard to the base, (2) calculation of percentages 
based on small absolute numbers, (3) misplaced decimal points, (4) arith- 
metic mistakes, (5) improper procedure in averaging percentages, (6) the 
use of percentages which are awkwardly large. These will be discussed 
in order. 

Confusion in regard to base. Some years ago the dean of a mid-western 
veterinary college was reported to have stated that over a period of five 
years (1916 to 1921) the enrollment in veterinary colleges in the United 
States had decreased 500 per cent. This would indeed mean a very small 
registration, since a decrease of 500 per cent would mean a negative figure 
four times the size of the original enrollment ! The absolute figures showed 

® For these and other railroad ratios, see A Y earhook of Railroad Information^ issued 
annually by the Committee on Public Relations of the Eastern Railroads, New York. 

® See F. E. Croxton and D. J, Cowden, Practical Business Statistics, pp. 139-149 
Frentice-Hall, Inc., New York, 1934. 
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an original enrollment of 3,160 students, which decreased to 641 five years 
later. The decrease was 2,519 students or 79.7 per cent. What the 
probably-misquoted dean most likely said was that the enrollment in the 
earlier year was 500 per cent of the enrollment in the later year. 

In the autumn of 1920 a determined effort was made by the United 
States district attorney to have restaurants in Pittsburgh lower their prices 
to a pre-war level. Newspapers announcing the success of the drive 
stated that Pittsburgh restaurants had cut their prices 50 to 100 per cent. 
It is, of course, clear that prices cannot be cut 100 per cent, else the servings 
formerly sold would be given away! The price reductions on a number of 
dishes were stated, the greatest reduction took place in the price of dough- 
nuts and pie. These had formerly sold at 15 cents per order. Identical 
sized servings were sold at 5 cents after the reduction; hence the reduction 
amounted to 66.7 per cent of the former selling price. 

Accounts appearing in newspapers in the spring of 1934 made public 
the results of a study of maternal mortality. There had been enumerated 
1,343 cases of preventable deaths. The attending physicians were held 
responsible for 61.1 per cent, the patients themselves were said to bo 
responsible for 36.7 per cent, while midwives were allegedly responsible 
for 2.2 per cent. The percentages were misleading because doctors attend 
a large proportion of confinement cases while midwives attend but a few. 
A proper procedure for calculating the percentages consists of, first, relating 
the number of deaths for which the physicians allegedly were responsible 
to the total number of cases attended by physicians, and, second, relating 
the number of deaths for which midwives presumably were responsible 
to the total number of cases attended by midwives. Upon such a basis the 
New York Obstetrical Society announced^ that, according to the same 
study mentioned above, ''responsibility is ascribed to the physician in 678 
maternal deaths, which is 47 per cent of the deaths occurring among 
patients attended by the physician, while the midwife was responsible for 
29 maternal deaths, or 60.4 per cent of the deaths in [among] women at- 
tended by the midwife.'' It was further quite properly pointed out that 
"it is an almost impossible task to ascribe responsibility in a large per- 
centage of cases. ..." 

Percentages from small nmnbers. An almost classic illustration of the 
undesirability of using percentages based upon small numbers is given by 
Chaddock.® 

A short time after Johns Hopkins University had opened certain courses 
in the U niversity to women, it was reported that 33^ per cent of the 

^ See the New York Times, April 12, 1934. 

® Robert E. Chaddock, Principles and Methods of Statistics, Houghton Mifflin Co.. 
Boston 1925, pn. 13-14. 
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women students had married into the faculty of the institution. Of course 
the important iniormation was the number of women students. There 
were only three. When dealing with a small number of cases j the use of 
percentages alone leads to wrong impressions. In these cases either per- 
centages should not be used at all or the numbers upon which they are 
based should accompany the percentages. 

Misplaced decimal points. Mistakes involving misplaced decimal 
points may lead to gross misinterpretations. They are a common sort of 
mistake and should be guarded against. Sir Josiah Stamp® gives a rather 
unusual illustration: 

A periodical return of revenue received into the Exchequer was laid 
before Lord Randolph, and his private secretary, Mr. George Gieadowe 
of the Treasury, was looking over his shoulder, and Lord Randolph ex- 
pressed satisfaction at the fact that the Customs revenue had increased 
by 34 per cent, as compared with the corresponding period in the pre- 
ceding year. Mr. Gieadowe pointed out to him that it was only .34 per 
cent. ^*What difference does that make?’^ asked Lord Randolph. When 
it was explained to him he said, have often seen those damned little 
dots before, but I never knew until now what they meant. 

Another example of a misplaced decimal point is mentioned on page 162. 

Arithmetic mistakes. A speaker discussing the upturn in employment 
in May 1934 pointed out that employment had increased by 32 per cent 
during the year. He then said, according to a newspaper, ''Cold figures 
and percentages donT mean very much to one not accustomed to deal 
with them, but these figures simply say that, as to any group of two men 
who were employed a year ago, one extra man has been added. The 
proper statement is that, as to any group of three men who were employed 
a year ago, one extra man had been added- 

Improper averaging of percentages. The occasional necessity for aver- 
aging percentages calls for mention of a pitfall and for consideration of the 
proper procedure. Consider the following figures: 

Foeeign-Born Population of the New England States, 1930 

„ Total white Foreign-horn Per cent 

population white population foreign-hom 


Maine 795,183 100,368 12.6 

New Hampshire 464,350 82,660 17.8 

Vermont ... 358,965 43,061 12.0 

Massachusetts . . 4,192,926 1,054,636 25.2 

Rhode Island . 677,016 170,714 25.2 

Connecticut . 1,576,678 382,871 24.3 


® Sir Josiah Stamp, Some Bconomic Factors in Modern .jU/e, p. 265. F* B. King and 
Bon, X^ndont 1929 
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It is desired to know the average proportion of foreign-born white persons 
for the New England division. If we add the six percentages and divide 
by six, we have 117.1 6 = 19.5 per cent. This figure, however, does 

not correctly represent the situation; the six percentages were calculated 
from different bases and therefore should be weighted accordingly. The 
easiest procedure for obtaining the correct percentage consists of totaling 
the white population for the six states (8,065,113 persons), totaling the 
foreign-born white population (1,834,310 persons), and dividing the second 
figure by the first. The result is 22.7 per cent, which is the proportion of 
foreign-bom white persons in the New England division. The same re- 
sult could also be obtained by averaging the six percentage figures, pro- 
vided each is weighted according to the base from which it has been calcu- 
lated. This procedure of multiplying each percentage by its base, sum- 
ming the results, and dividing by the sum of the base figures (or weights) 
is essentially the same as the method just used. The result, however, is a 
little less accurate since each percentage figure has been rounded. The 
error involved in rounding a given percentage is magnified when the per- 
centage is multiplied. But since some percentages are understated and 
some are overstated, there is a tendency for these errors to counterbalance. 

Unduly large percentages. While percentages are extremely useful for 
purposes of comparison, percentage figures which are very large may serve 
to confuse. The New York T^mes for July 27, 1933, states in a headline' 
^^N. Y. Central Gains 2000% for Month.^^ The net operating income 
for the New York Central Railroad was $192,052 for June 1932 and 
$4,384,965 for June 1933. The latter figure is 2183 per cent greater than 
the former. The use of a figure as great as 2000 per cent is meaningless 
to many people, and a more accurate impression could probably be con- 
veyed by indicating that June 1933 operating income was 22 times larger 
than June 1932 operating income; or, it would also be correct to say that 
June 1933 operating income was 23 times as great as June 1932 operating 
income. 

That the use of large percentage figures may lead to difficulties is vividly 
brought out in the same paper for April 4, 1935. The statement is made 
that a certain bank ^^grew by 3000 per cent.^^ The bank referred to was 
the Union Trust Company of Pittsburgh, the resources of which mounted 
from $100,000 to $300,000,000. The second figure is actually SOOO times 
the first figure, or, in percentages, 300,000 per cent of the smaller figure. 
The growth then was 299,900 per cent, or 2999-fold. 
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CHAPTER VIII 


THE FREQUENCY DISTRIBUTION 


One method of organizmg and summarizing statistical data consists in 
the formation of a frequency distribution. In this device the various 
items of a series are classified into groups and the number of items falling 
into each group is stated. A frequency distribution is shown in Table 27. 
Sometimes the user of statistics wiU find frequency distributions already 
constructed in the publications to which he may refer; sometimes he will 
construct his own frequency distribution from unclassified data. We shall 
begin our discussion of the frequency distribution by first considering the 
appearance of the raw or unclassified data. 

Raw Data 

The unclassified data from which a frequency distribution might be 
made may appear as do the data of Table 25. Here we have the grades 
made by the 1937 graduating class of the United States Naval Academy 
for the 4--year course. The arrangement of the grades is according to the 
alphabetical order of the midshipmen's names, though we have omitted 
the names in order to save space. Another illustra^p^ of raw data, from 
which a frequency distribution might be construct^; is the payroll of a 
factory. The employees on the payroll may be listed alphabetically by 
name; by employee number; by departments, and then by name or num- 
ber; by seniority; or in some other convenient order. Considering the 
grades of the midshipmen as shown in Table 25, it is apparent that very 
little information is forthcoming unless the figures are rearranged. When 
the data are in this form, it is a tedious task to find even the lowest grade 
and the highest grade. It is yet more difficult to ascertain around what 
value the grades tend to concentrate, or if indeed they do show such a con- 
centration. These and other steps in analysis are facilitated by rearrang* 
ing and summarizing the data.’ 
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TABLE 25 

Geades Received tn the 4 -Year Course by Members of the 1937 Gradua'itng 
Class of the United States Naval Academy 

(Alphabetical listing, names omilted for brevity) 


78 9 

78 2 

71 2 

74 8 

i ^4 6 

74.4 

85.9 

74.2 

S4.7 

77.9 

80 7 

713 

79 2 

1 86 3 

! 76.1 

72 1 

76.3 

72.1 

79.2 

74.6 

80 2 

78.7 

78 2 

74 0 

1 72 9 

79 0 

70 7 

74 4 

78 5 

74.1 

76 5 

73,0 

79 3 

i 73 4 

75 5 

75 8 1 

72 8 

73.5 1 

86 2 

76.5 

73 5 

75 1 

79 2 

! 78 0 

81 2 

71 1 i 

81 1 

77.7 

78 1 

79.6 

89 5 

76 5 

75 4 

7 5 5 

816 

79 2 

92 1 

85.4 

77 4 

91,0 

79 9 

81 5 

69 2 

85 1 

76 7 

76 1 

74 4 

78.8 

74 9 

85 0 

74 9 

76 2 

81 5 

76 3 

74 5 

82 5 

80 8 

70.6 

74 8 

73,6 

74 2 

73 7 

83 2 

74 9 

75 7 

j 73 4 

75 2 

70.7 

79.5 

77 8 

89 2 

73 3 

810 

90 2 

73.0 

82 7 

76 0 

72.2 

80 4 

77 2 

86 4 

73 8 

71 4 

74 7 

79 9 

80 3 

73 3 

74.9 

78 2 

79.8 

77 2 

i 812 

80 4 

78 2 

77 8 

1 77.6 

75 1 

73.5 

74 8 

77 0 

80 4 

73 9 

69.8 

82 3 

83 4 

79 3 

819 

84.4 

81.3 

72.6 

73 2 

78 6 

76 2 

79.7 

87 3 

84 2 

75 5 

! 78.3 

79 1 

75 3 

74 3 

84 3 

1 77 8 

81 9 

812 

83 3 

89 7 

85.8 

74 2 

79 2 

82 4 

80 8 

71 1 

75 0 

83 7 

75 4 

78 6 

76.8 

76 0 

74 1 

80 4 

73.4 

715 

77 4 

85 7 

76 0 

79 0 

! 75.2 

77 6 

76 5 

76.7 

76 0 

76 2 

82 1 

82 9 

79 0 

74 4 

75 5 

77.3 

82 1 

76 0 

74 5 

77 1 

79 7 

73 0 

81 3 

77 4 

! 77 6 

79 4 

81 1 

84 2 

84.3 

72 4 

77 6 

82 1 

72 1 

79 1 

1 74 6 

71 7 

86 5 

81 3 

80 3 

77 8 

77 2 

78.8 

74 8 

74 4 

76 4 

72 9 

72 0 

75 3 1 

73 7 

82 7 

82 0 

86 8 

78 2 

77.6 

71.8 

71.2 

73 8 

72.3 

77.5 j 

71 3 

86 5 

80 6 

86.1 

74 2 

75 6 

76 6 

740 

79,3 

719 

819 

84 7 

73 9 

79 1 

717 

78.6 

S4fc5 

891 

74 9 

77 5 1 

73 7 

72 3 

78.0 

78 2 

77 2 

80.4 

86.3 

74 4 

76.3 

77 5 

83 9 

79 7 

76.2 

81.0 

74.9 

84 5 

83.5 

73 5 

74 6 

75.1 

79.1 

78 5 

82.0 

75 4 

82 2 

73 5 

76.4 

68.8 

86 1 

74 4 

75.1 

719 1 

81.5 

819 i 

73 8 

81.1 

86 2 

77 9 

78 7 

68 9 

78 2 

78.9 

77.8 

78.5 

81.0 

80 4 

78.7 

74 5 

76 4 

801 

72 9 

75 4 

72 8 

87 0 

801 

77 5 

75.2 

83.3 

75 7 

77 4 

74.5 

82.8 ; 

75.9 

76.4 

77 3 1 

74 4 

' 83.4 


714 

79 6 

74 4 

72.6 

79.8 

77.2 

73 2 1 

85.0 

! 78.3 


85 2 

76 6 

78 6 

75 1 

85 4 1 

1 

76 4 

86 7 i 

75 7 

1 83.0 



Source Adapted fior i .Ip.k/'iZ of ike United States N’aval Academy, 1937-19‘J8, pp 41-46, which 

gives grade-' n tfT-i oi i u i\i' le ji (,i 1000 


The Array 

In Table 26 the midshipmen ^s grades have been rearranged in descending 
order. Such an arrangement (whether ascending or descending) is called 
an array. It arranges the items in order of magnitude. We have not 
summarized; that will be done wRen we construct the frequency distribu- 
tion. A consideration of the array puts us in a position to leam something 
from the data. First, the array enables us to see at once the range of the 
grades, which varied from 68.8 to 92.1. Second, it may also be seen that 
there is a concentration of grades somewhere between 74 and 78. This 
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will be more clearly seen when we examine the frequency distribution and 
consider measures of central tendency Third, a somewhat more extended 
examination gives us a rough idea of the distribution of the grades. We 

TABLE 26 

Akkay op Grades Received in the 4- Year Course by Members op the 1937 
Graduating Class op the United States Naval Academy 


92 1 

84.4 

81 5 

79 7 

78 5 

77 4 

76 1 

74 9 

74 1 

72,6 

910 

84 3 

81 5 

79 7 

78 5 

77 4 

76 1 

74 9 

74 0 

72.4 

90 2 

84 3 

813 

79 7 

78 5 

77 4 

76 0 

74.9 

74 0 

72.3 

89 7 

84 2 

81 3 

79 6 

78 3 

77 3 

76 0 

74 9 

73 9 

72 3 

89 5 

84 2 

813 

79 6 

78 3 

77 3 

76 0 

74.9 

73 9 

72.2 

89 2 

83 9 

812 

79 5 

78 2 

77 2 

76 0 

74 8 

73 8 

72.1 

89 1 

83.7 

812 

79.4 

78 2 

77 2 

76 0 

74 8 

73 8 

72.1 

87 3 

83 5 

81.2 

79 3 

78.2 

77 2 

75 9 

74 8 

73 8 

72 1 

87 0 

83.4 

81 1 

79 3 

78 2 

77.2 

75 8 

74.8 

73 7 

72.0 

86 8 

83.4 

81 1 

79 3 

78 2 

77 2 

75.7 

74 7 

73.7 

71.9 

86.7 

83 3 

81.1 

79 2 

78 2 

771 

75 7 

74.6 

73 7 

71.9 

86.6 

83.3 

81.0 

79 2 

78 2 

77 0 

75.7 

74 6 

73 6 

718 

86 5 

83.2 

810 

79 2 

78 1 

76 8 

75 6 

74 6 

73 6 

717 

86 4 

83.0 

810 

79 2 

78 0 

76 7 

75.5 

74 6 

73.5 

71,7 

86.3 

82 9 

80 8 

79 2 

78 0 

76.7 

75 5 

74 5 

73 5 

715 

86.3 

82.8 

80 8 

791 

77 9 

76 6 

75 5 

74.5 

73.5 

714 

86 2 

82 7 

80 7 

791 

77 9 

76 6 

75.5 

74.5 

73 5 

71.4 

86 2 

82.7 

80 6 

791 

77 8 ! 

76 5 

75 4 

74 5 

73 5 

71.3 

86.1 

82 5 

80.4 

791 

77 8 

76 5 

75 4 

74.4 

73 4 

713 

86.1 

82.4 

80 4 

79 0 

77 8 

76.5 

75.4 

74 4 

73.4 

712 

85 9 

82.3 

80 4 

79 0 

77 8 

76 5 

75 4 

74.4 

73.4 

71.2 

85 8 

82.2 

80 4 

79 0 

77 8 

76 4 

75.3 

74 4 

73 3 

71.1 

85 7 

82.1 

80.4 

78.9 

77.7 

76.4 

75.3 

74.4 

73 3 

71.1 

85 4 

82.1 

80.4 

78 9 

77.6 

76 4 

75 2 

74 4 

73 2 

70 7 

85 4 

82.1 

80 3 

78 8 

77.6 

76 4 

75 2 

74 4 

73 2 

70 7 

85.2 

82.0 

80 3 

78 8 

77 6 

76 4 

75,2 

74 4 

73.0 

70 6 

85 1 

82.0 

80 2 

78 7 

77.6 

76 3 

75 1 

74 4 

73 0 

69 8 

85.0 

81 9 

80 1 

78 7 

77 6 

76 3 

75.1 

74.3 

72 9 

69.2 

85.0 

81.9 

80.1 

78 7 

77.5 

76 3 

75.1 

74.2 

72.9 

68.9 

84.7 

81.9 

79 9 

78.6 

77.5 

76 2 

75.1 

74.2 

72 9 

68.8 

84.7 

81.9 

79 9 

78.6 

77 5 

76.2 

75 1 

74.2 

72 8 

84 5 

81.6 

79 8 

78 6 

77 5 

76 2 

75.0 

74.2 

72.8 


84.5 

81.5 

79.8 

78 6 

77 4 

76.2 

74 9 

74.1 

72.6 



may obsen-'e, for example, that there are few grades below 71 or above 86. 
This particular feature of the series will be much iiiore readily studied 
when we have the frequency distribution. Fourth, it may be noticed that 
the figures show a fairly regular continuous change. If the grades are 
expressed as whole percentages, all consecutive values from 69 to 87 are 
represented. If we consider the figures as shown, to one decimal place, 
we may observe only three values not represented from 73.2 to 80.4, within 
which range 205 of the 327 midshipmen occur. If the grades had been for a 
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larger number of students, this tendency would have been more marked. 

The array, however, is a cumbersome form of the data. Furthermore, 
it is troublesome to construct, because of the necessity of rearranging all 
the items. One fairly satisfactory method of constructing an array con- 
sists of recording the figures on small cards and sorting the cards. Of 
course, if the data are punched on mechanical tabulatmg cards, the con- 
struction of an array is simple. 

When studying grades, we may frequently want to make an array. 
The Naval Academy publishes each year a jMerit Roll of the graduating 
class, listing the names and standings of the midshipmen in the order which 

TABLE 27 

Frequency Distribution of 
Grvdes of the 1937 Graduating 
Class of thf. United States 
Naval Academy 


Grade 

Number of 
midshipmen 

68 0-69.9 

4 

70 0-71.9 

17 

72 0-73 9 

39 

74 0-75 9 

62 

76 0-77.9 

58 

78 0-79 9 

52 

80 0-81 9 

35 

82 0-83.9 

22 

84 0-85 9 

18 

86 0-87 9 

13 

88 0-S9 9 

4 

90 0-91.9 

2 

92 0-93 9 

1 

Total 

327 


is given in Table 26. If we are interested in a campaign to raivse funds for 
a hospital or community chest, it might be very useful (for publicity pur- 
poses, for example) to list the individual gifts in descending order. It is 
obvious, however, that such a listing of 500 or 1,000 contributions would 
be cumbersome and of limited value. In many instances there is no 
particular advantage in making an array. It would be a waste of time 
for a concern to make an array of the amounts paid to its employees each 
month. There is not much reason why a bank should make an array of the 
daily balances of its many depositors. On the other hand, a student of 
vital statistics might find it very valuable in a study of birth rates to array 
the various cities in ascending or descending order and consider the reasons 
for the difierences. 
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The Frequency Distribution 

The array of Table 26 rearranged the midshipmen's grades. The fre- 
quency distribution of Table 27 summarizes the grades into 13 groups or 
classes. It is obvious that the frequency distribution does not show the 
details given in the array, but much is gained by the summarization. We 
can see that the lowest grade is not below 68 and that the highest grade is 
not auite 94; we cannot ascertain the exact values of the highest and lowest 
grades as we did from the array. The concentration of grades in the neigh- 
borhood of 74-78 is apparent at a glance. If we draw a curve of the fre- 
quency distribution, as in Chart 81, wo can visualize the data readily and 
we may make comparisons wdth other series as discussed in later sections 


NUMBEfi OF 
MIDSHIPMEN 
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68 70 72 lA 76 76 80 82 84 86 88 90 92 94 

GRADE 


Ckixt 81. Grades of the 1937 Graduating Class of the United States Naval Academy. 

(Data of Table 27 ) 


of this chapter. Having classified the data, we are in a position to make 
rapid computations of certain values (discussed in the following chapters) 
which will assist us iia describing and analyzing the data. 

When an array is available, the frequency distribution may be made by 
merely counting the items. It is not advisable, however, to make an array 
solely for the purpose of making the frequency distribution, because too 
great an amount of time is required to construct the array. 

If the data are in unorganized form, as in Table 25, we may construct a 
frequency distribution by a scoring device similar to that shown in Chaptei 
II. Another method of handling the figures consists of making an entry 
form such as that of Table 28. This is less laborious than making an array 



TABLE 28 

Entry Form for Grades op the 1937 Graditating Class op the United States 

Naval Academy 


68 0- 
69 9 

70 0- 

71 9 

72 0- 
73.9 

74 0- 

75 9 

76 0- 

77 9 

78 0- 

79 9 

80 0- 
81 9 

82 0- 
83 9 

84 0- 

85 9 

86 0- 
87.9 

88.0- 

89.9 

90.0- 

91.9 

92.0- 

93,9 

68 9 

69 2 
69 8 
68 8 

71 4 
71 3 
71 9 
71 2 
71 4 
71 1 
71 5 
71 3 
71 9- 
71 1 

70 7 

71 7 
70.6 

70 7 

71 8 
71 7 
71 2 

73 5 
73 2 

72 3 

73 6 
73 7 
73 3 
73 8 
73.9 
73 4 
73 7 
72 4 

73.7 

72 9 

73 4 
72 3 
72 6 

72 9 

73 0 
73 0 
73 9 

72.8 

72 1 

73 4 
72 1 

72 8 

73 3 
73 8 
73 2 
72.1 
73 5 

72 2 

73 5 
73 5 

, 72 9 
73 6 
72 6 

72 0 

73 8 
73.5 

( 

74 9 
74 2 
74 3 
75? 
74 9 

74 6 

75 7 
75 1 

74 5 

75 1 

74 4 

75 4 
75 1 
74 3 
74 4 

74 8 
74.0 

75 5 
74 9 

74 7 

75 0 
75.4 
75 1 

74 6 

75 5 

74 5 

75 7 
75 9 

74 4 

75 8 
75 4 
74 8 
75.4 

74 4 

75 2 
75 1 
75 5 
74 4 
74 4 
74 2 
74 9 
74 2 

74 4 

74.0 

75 2 
75 5 

74 6 

75 6 

74 4 

75 7 
74.9 
74.8 

74 8 

74.2 : 

75 2 
74.6 

74.1 

75.3 
74.1 
74.0 

74.4 

74.5 

76 5 

77 2 
76 7 
76 0 
76 3 
76 4 
70 5 
76 2 

76 0 

77 5 
77 5 

77.5 
77 4 

76 6 
7G2 

77 8 
7G2 
77.1 
77 8 
7G3 
77 4 
77 0 
77 2 
76 1 

76 7 

77 8 

76 2 ' 

77 8 
76 1 

77.6 

76 0 
7G4 

77 2 
76 4 
76.3 

76 0 

77 4 
77 6 
77 2 
77 3 
77 7 
76 8 
77.6 

76 4 

77 5 
77 4 
7G.0 
77 G 
77 3 
7G.G ^ 
7G4 
77 9 

76 5 

77 8 
77 2 
77.0 

76 5 

77 9 

78 9 

79 9 

79.3 
78 7 
78 2 
78 7 

78 6 

79 6 
79 2 

78 2 

79 3 
79 2 
79 1 
78 2 
78 G 
78 0 

78 2 

79 7 
79 7 
79 7 
78 5 

78 9 

79 9 
78 8 

78 0 

79 8 

79.0 
79 2 
79 3 
79 0 
78 2 

79.1 

78.2 
78 5 

78 6 

79 0 
79 1 
78 8 
78 3 

78 6 

79 2 
78 5 

78 1 

79 5 
78 2 
79.1 

79.4 
78,7 

78 3 

79 6 
79 8 
79 2 

80 7 
80 2 
80 4 

80 4 
813 

81 5 
81 2 
80 8 

80 3 
80 1 

81 5 
81 0 

80 4 

81 9 
81 9 
81 2 
81 6 
81 2 
80 6 
81 5 

80 3 

81 3 
81 0 
81 9 
81 1 
80 8 
81 9 
81 0 
80.1 

80 4 

81 1 
80 4 

80 4 

81 3 
SI 1 

82 4 

83 2 

82 7 

83 9 

82 3 
82 1 
82 0 
82 8 

83 4 
83 7 
82 9 
82 1 
82 0 
82 5 

82 7 

83 3 
82 2 
83 5 
83 4 
83 0 
82 1 
83 3 

i 

1 

84.2 
85 2 
84 3 

84 3 

85 1 

84 7 

85 7 
85 4 
84.2 
85 9 
85 4 
844 
85 S 

84 5 

85 0 
84 7 

84 5 

85 0 

i 

86.4 
86.1 
86 3 

86 5 

87 3 
86 8 
86 1 
87.0 
86.7 
86.2 
86.3 
86.2 
86 5 

km. 

89 5 
89.2 
89.7 
89.1 

90 2 
91.0 

1 

j 

92.1 
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and has certain advantages over the scoring procedure. The advantages 
of the entry form are: (1) We can scan the columns to see if any item is 
incorrectly entered. (2) We can total the items entered and check this 
total against the total of the unclassified data. (3) If we should decide 
that we want classes of 1 per cent or 3 per cent instead of 2 per cent, we 

TABLE 29 

Average Hourly Earnings op 13,427 Wage E^^rners in Open-Hearth Furnaces 

1935 


Average hourly earnings 
(cents) 

Number of 
wage earners 

Frequency densities, 
number of wage earners 
per 2 5 cents of earnings 

25 0 and under 27 5 

1 

1 

27 5 and under 30 0 

27 

27 

30 0 and under 32 5 

13 

13 

32 5 and under 35.0 

43 

43 

35 0 and under 37.5 

90 

90 

37 5 and under 40 0 

98 

98 

40 0 and under 42.5 

289 

‘ 289 

42 5 and under 45.0 

360 

1 360 

45 0 and under 47 5 

779 

! 779 

47 5 and under 50 0 

1,284 

1,284 

50.0 and under 55 0 

1,437 

718 50 

55 0 and under 60.0 

1,263 

631 50 

60 0 and under 65 0 

1,134 

567 00 

65.0 and under 70 0 

1,081 

540 50 

70 0 and under 75.0 

957 

478 50 

75 0 and under 80 0 

777 

388.50 

SO 0 and under 85.0 

j 613 

306.50 

85 0 and under 90 0 

' 577 

288 50 

90.0 and under 100 0 

809 

202 25 

100.0 and under 110 0 

546 

136 50 

110.0 and under 120 0 

287 

71.75 

120 0 and under 130 0 

319 

79 75 

130.0 and under 140 0 

202 

50 50 

140 0 and under 150 0 

129 

32 25 

150 0 and under 160 0 

83 

20 75 

160 0 and under 170 0 

82 

20 50 

1 

170.0 and over * 

147 


Total 

13,427 



Source Montt.'y Labor Remw , Vol 42, No 4 (April 1936), p 1045 


can re-form our frequency distribution with little effort. (4) As will be 
shown in the next chapter, the entry form enables us to find out how closely 
the mid-value of a class agrees with the average of the items in that class 
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If desired, the classes used in the entry form may be narrower than we 
think we shall want for the frequency distribution. These classes may then 
be readily combined into wider ones, using whatever interval and whatevei 
class limits seem advisable. 

All the class intervals of the frequency distribution of Table 27 are 2 
per cent. Charting and computations are facilitated when the class inter* 


NUMBER 
OF TIRES 



Chart 82. Mileage of 98 Automobile Tires, Size 4.75x19; 1,000-Mile Class In- 
tervals. (Data from a confidential source The tires were used by a fleet of delivery 
trucks in and around New York City ) 

vals are all the same. Whenever possible, therefore, frequency distribu- 
tions should be constructed with uniform class intervals. This, however, 
is not always practicable (see for example Table 42). In Table 29 there 
is also shown a frequency distribution which has non-uniform class inter- 
vals. In this instance the result is to give more detailed information for 
the groups having lower earnings. 

Selecting the number of classes. No hard and fast mie can be given as 
to the number of classes into which a frequency distribution should be 
divided. If there are too many classes, many of them will contain only a 
few frequencies and the distribution will show noticeable irregularity 
when plotted. If there are too few classes, so many frequencies ^vill be 
crowded into a class as to cause much information to be lost. Chaii 82 
shows a series which is rather irregular because the mileage records of 98 
automobile tires were grouped into 32 classes each 1,000 miles in width. 
The curve assumes a much more regular outline in Chart 83 when the 
same data are put into 12 classes of 3,000 miles each. The number oi 
classes to use depends upon the number of frequencies in the series and the 
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and has certain advantages over the scoring procedure. The advantages 
of the entry form are* (1) We can scan the columns to see if any item is 
incorrectly entered. (2) We can total the items entered and check this 
total against the total of the unclassified data. (3) If we should decide 
that we want classes of 1 j^er cent or 3 per cent instead of 2 per cent, we 


TABLE 29 

Average Hourly Earnings of 13,427 Wage Earners in Open-Hearth Furnaces 

1935 


Average homly earnings 
(cents) 

Number of 
wage earners 

Frequency densities, 
number of wage earners 
per 2 5 cents of earnings 

25 0 and under 27 5 

1 

1 

27 5 and under 30 0 

27 

27 

30 0 and under 32 5 

13 

13 

32 5 and under 35 0 

43 

43 

35 0 and under 37 5 

90 

90 

37 3 and under 40 0 

98 

98 

40 0 and under 42 5 

289 

289 

42 5 and under 45 0 

360 

360 

45 0 and under 47 5 

779 

779 

47 5 and under 50 0 

1,284 , 

1,284 

50 0 and under 55 0 

1,437 

718 50 

55 0 and under 60 0 I 

1,263 

631 50 

60 0 and under 65.0 

1,134 

567 00 

65 0 and under 70 0 

1,081 

540.50 

70 0 and under 75.0 

957 

478 60 

75 0 and under SO 0 

777 

388 50 

SO.O and under 85 0 

613 

306 50 

85. 0 and under 90 0 

577 

288 50 

90 0 and under 100 0 

809 

202.25 

100.0 and under 110 0 

546 

! 136 50 

110.0 and under 120 0 

287 

i 7175 

120 0 and under 130 0 

319 

79 75 

130 0 and under 140 0 

202 

1 50 50 

140.0 and under 150 0 

129 

32 25 

150 0 and under 160 0 

83 

20 75 

160 0 and under 170,0 

82 

20 50 

170.0 and over 

147 


Total 

13,427 



Source Montn ':/ Labor Remw , Vol 42, No 4 (April 1936), p 1045 


can re-form our frequency distribution with little effort. (4) As will be 
shown in the next chapter, the entry form enables us to find out how closely 
the mid-value of a class agrees with the average of the items in that class 
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If desired, the classes used in the entry form may be narrower than we 
think we shall want for the frequency distribution. These classes may then 
be readily combined into wider ones, using whatever interval and whatever 
class limits seem advisable. 

All the class intervals of the frequency distribution of Table 27 are 2 
per cent. Charting and computations are facilitated when the class inter^ 
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Chart 82. Mileage of 98 Automobile Tires, Size 4.75 x 19; 1,000-Mile Class In- 
tervals. (Data from a confidential source The tires were used by a fleet of delivery 
trucks in and around New York City ) 

vals are all the same. Whenever possible, therefore, frequency distribu- 
tions should be constructed with uniform class intervals. This, however, 
is not always practicable (see for example Table 42). In Table 29 there 
is also shown a frequency distribution which has non-uniform class inter- 
vals In this instance the result is to give more detailed information for 
the groups having lower earnings. 

Selecting the number of classes. No hard and fast rule can be given as 
to the number of classes into which a frequency distribution should be 
divided. If there are too many classes, many of them will contain only a 
few frequencies and the distribution will show noticeable irregularity 
when plotted. If there are too few classes, so many frequencies will be 
crowded into a class as to cause much information to be lost. Chart 82 
shows a series which is rather irregular because the mileage records of 98 
automobile tires were grouped into 32 classes each 1,000 miles in width. 
The curve assumes a much more regular outline in Chart 83 when the 
same data are put into 12 classes of 3,000 miles each. The number ol 
classes to use depends unon the number of frequencies in the series and thf 
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regularity with which the frequencies are distributed within the range of 
values. The greater the number of frequencies, the more classes we may 
have. Also, the more regular the distribution of the frequencies, the more 
classes we may use, since data having a high degree of regularity may be 
divided into a large number of classes without showing gaps and irregu- 
larities in the frequencies. In general it might be said that fewer than 6 or 
8 classes should rarely be used, and that more than 16 classes would be 
useful only for working with extensive data. When the number of classes 

NUMBER 

OF TIRES 



Chart 83. Mileage of 98 Automobile Tires, Size 4.75x19; 3,000-Mile Class In- 
tervals. (The vertical scale for this chart is one-third that of Chart 82 in order to 
compensate for the fact that the class intervals used for this chart are three times those 
of the preceding chart For source of data see Chart 82 ) 

has been determined, the range of values for the entire distribution indl 
cates the class interval to be used. 

Selecting class limits. It was pointed out in Chapter IV that the mid- 
value of each class is taken to be representative of the class. The mid- 
values of the classes are made use of not only when charting the frequency 
distribution, but also in making various computations to be discussed in 
later chapters. If the limits of each class are not clearly indicated, the 
mid-value, which is the average of the upper and lower limits, cannot be 
properly determined. The adequacy of the mid-value assumption will 
be discussed more fully in Chapter IX. It is important at this point to 
make clear that, when a frequency distribution is being constructed, the 
class limits should be so chosen that the mid-value of each class will coin- 
cide, so far as possible, with any values around which the data tend to b^ 
concentrated. 

Suppose that measurements are made of the academic standing of a 
iaige group of college freshmen upon a numerical scale ranging from 0 to 
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100. The data could be expected to be graduated smoothly from, say, 
30 to nearly 100. There would be students rating 88.0 and others 89.0; 
in addition there would be still others falling between these two values. 
If a large enough group were to be measured the minuteness of the varia- 
tions between 88.0 and 89.0 would be limited only by the accuracy of the 
measuring instrument (in this case, the grading system). There would 
not be a series of values around which the frequencies would tend to con- 
centrate, and the problem mentioned at the end of the preceding para- 
graph would not arise. 

On the other hand, consider the meal checks of a cafeteria, many 
(but not all) of which are a multiple of 5 cents. In this instance the class 
intervals should be written 8-12 cents, 13-17 cents, 18-22 cents, etc., 
thus giving mid-values of 10 cents, 15 cents, 20 cents, etc., which coincide 
with the concentration points. 

The data of freshmen grades and the ratings of midshipmen are illustra- 
tions of what is termed a continuous variable, since the values are capable 
of infinitely small variations from each other. Heights and weights of 
people are also continuous variables. Length of life is another illustration. 
The data of cafeteria meal checks are illustrative of discrete or discontinuous 
data, since the values differ from each other by finite amounts — in this 
case, one cent. A discrete variable need not show the concentrations 
which were present in the meal check data. For example, if a large group 
of workmen are employed at vsimilar tasks and are paid on a piece-rate 
basis (that is, upon the basis of amount produced), it is quite possible that 
there may be individuals receiving $21.21, $21.22, $21.23, etc., for a weeks 
work. Although piece rates might be, and often are, in fractions of a cent, 
the weekly payment must be in terms of whole cents. 

The foregoing suggests an important consideration; namely, that we 
are not so much concerned with the fact that a variable is discrete as 
we are with the fact that the data may be broken and that there are in- 
herent gaps and concentrations in the actual data in hand. The twenty- 
second annual report (for the year 1935) of the Board of Governors of the 
Federal Reserve System lists the salaries paid to all of the 328 officers and 
employees of the Board of Governors. These salaries range from $840 
per annum to $15,000. There is in no sense an evenly graduated distribu- 
tion between these limits. The gaps between adjacent values range from 
$10 to $5,000, and there are pronounced concentrations at various custom- 
ary salaries such as $1,500, $1,800, $2,000, $2,500, $3,000, $3,600, etc. 
The selection of class limits for a distribution of this type presents great 
diflficulty, as often it is not possible to adjust the mid-values to coincide 
with all concentration points. An approximate adjustment must then 
suffice. 
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The fact that we may be dealing with a continuous variable does not 
warrant us in selecting class limits blindly. If data are being collected 
concerning weights of individuals, reported to the nearest pound, persons 
reported as weighing 142 pounds would vary between 141.5 pounds and 
142.5 pounds; as a group they would average about 142 pounds. Suppose, 
however, that weight is reported to the last full pound. In that event 
persons reported as weigliing 142 pounds would vary between exactly 142 
pounds and just under 143 pounds; as a group they would average about 
142.5 pounds. Let us assume that a frequency distribution with class 
interval of 3 pounds is to be formed. If weights have been reported to 
the nearest pound, it is correct to write class intervals ^142-144, 145-147, 
148-150,'^ etc., with mid~values of 143, 146, 149, etc. If, however, weights 
have been reported to the last full pound, the above is incorrect, but it is 
correct to write “142 and under 145, 145 and under 148, 148 and under 
151,^^ etc., with mid-values of 143.5, 146 5, 149.5, etc. 

Two cautions should be noted concerning the writing of class limits. 
In the first place, designations such as “$50-$100,” “$100~$150,^^ etc., 
should never be used, since the limits overlap. A reader cannot be sure 
whether $100 belongs in the first class or the second. If tally sheets or 
entry forms are made with such overlapping class limits, it is possible that 
$100 items may sometimes be placed in the $50-$100 class and sometimes 
in the $100-$150 class. If the data are originally given in dollars, the class 
intervals for both the work sheet and the frequency distribution should 
read “$50-$99,^^ “$100-$149,” etc. If the data are in dollars and cents, the 
class intervals should read “|50.00-$99.99,^^ “$100.00-$149.99,’^ etc. In 
rare instances an investigator will write his class limits “$50.01-$100.00,” 
“$100.01-$150.00,’' etc. The former arrangement is more desirable. The 
second caution is to avoid writing class limits so that they are mutually 
exclusive. For example, the wording “over $50 but under $100,” “over 
$100 but under $150,” is incorrect because $100 is excluded from both 
classes. 

Curves of frequency distributions. The graphic representation of a 
frequency distribution was discussed in Chapter IV. Although a fre- 
quency distribution may be represented either by a column diagram or a 
curve, it is usual to employ the latter device. (We shall make use of the 
column diagram in Chapter XI.) One advantage of the curve is that two 
or more curves may readily be drawn on the same axes for purposes of 
comparison. In any event, the first step in the analysis of a frequency 
distribution should be the construction of a chart, for it will tell us at a 
glance with which of the following types of distributions we are dealing 

Chart 81 shows us the graphic appearance of the data of midshipmen^ 
grades which are shown in Table 27. Although rather regular in appear- 



Chap. 8] 


THE FREQUENCY DISTRIBUTION 


175 


ance, tMs curve is not S3mainetrical, but is slightly skewed to the right. 
(SkeT\Taess is discussed in Chapter X.) Many frequency distribution 
curves encountered in the social sciences are asymmetrical and frequently 
are skewed to the right. Only rarely do we find a curve skewed to the left. 

Biological and anthropometrical series (especially those invokdng linear 
measurements, such as height, rather than two- or three-dimension 
measurements, such as waist circumference or weight), frequently yield 
curves which are almost symmetrical Witness the curve of the basal 
diameter of the egg masses of snails, showm as Chart 116. Another 
roughly symmetrical series is shown in Chart 84, which pictures the height 
distribution of a large group of male industrial workers. 


NUMBER 6F 
MALE WORKERS 



56 - 58 60 62 64 66 68 70 72 74 76 78 


HEIGHT IN INCHES 

Chart 84. Heights of 9,552 Male Industrial Workers. (Data from A Health Study 
&f Ten Thousand Male Industrial Workers^ p. 59. United States Public Health Service, 
Public Health Bulletin No. 162 ) 

A curve which is skewed to the left appears in Chart 85, which shows the 
batting averages of 881 players of the two major leagues (the American 
League and the National League) and the three leading minor league?* 
(the American Association, the International League, and the Pacific 
Coast League). These 881 players include all those who had played in 
10 games or more and who had been at bat at least 25 times. Thus the 
figures include both substitutes and regular players. If the substitutes 
are excluded from the data, the resulting curve for regular players ap- 
proaches symmetry (see Chart 119). This difference is due, of course, to 
the fact that the substitute players, as a group, are not such good batters 
as are the regulars. 

Because it roughly resembles the letter J, the curve of Chart 86 is termed 
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a curve/’ Notice that the death rates from accidental causes were 
lower for younger persons and higher for persons who were older. The 
slight upturn at the very left of this curve is not always present in a J 
curve. 

The shape of the curve of Chart 87 is essentially the reverse of the one 
just mentioned. Such a curve may be termed a ^h'everse J curve.” The 
curve in Chart 87 indicates the length of time during which cars were 
parked in the Loop district of Detroit, and shows a great many cars parked 

NUMBER 
OF MEN 



Chart 85. Batting Averages of 881 Major and Minor League Players, 1936. (Data 
compiled from newspaper summaries by David L. Holbein. The players included are 
those who participated in 10 or more games and who were at bat 25 times or more. 
The minor leagues included were the International Association, the American Associa- 
tion, and the Pacific Coast League.) 

for short periods and generally smaller numbers parked for longer lengths 
of time. 

Another type of curve which is occasionally observed has large numbers 
of cases at each end of the X-scale and smaller frequencies for the inter- 
mediate X values. The term “U curve” is applied to frequency polygons 
of this shape. Chart 88 depicts a XJ curve showing for the state of Michi- 
gan the proportion of males in each age group who were unemployed in 
January 1935. A bimodal curve, showing two concentrations of frequen- 
cies appears as Chart 100 and is discussed in the accompanying text. This 
is a rather unusual form and need not be considered here. 
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Plotting a frequency curve when the class intervals are unequal. In the 

case of certain frequency distributions it is not feasible to maintain the 
same class interval throughout. The distribution of Table 29 has ten 
classes of 2 5 cents, eight classes of 5 cents, eight classes of 10 cents, and 



0 20 40 60 80 100 

AGE IN YEARS 

Chart 86. Death Rates from Accidents, by S-Year Age Groups ifor the Registration 
States of 1920), 1933. (Data from United States Bureau of the Census.) 

one class of indeterminate width. It would not have been desirable to 
use 2.5 cent intervals tliroughout since that would have necessitated 
fifty-eight classes to cover the range from 25 cents to 170 cents. Not only 
would there be far too many classes to be useful, but many classes would 
include no, or very few, frequencies. Class interwils cf 6 cents throughout 
would not be desirable either, since details concerning those having average 
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hourly earnings of less than 50 cents would be lost, in spite of the fact that 
there would be twenty-nine classes instead of the original twenty-six to 
cover the range 25 cents to 170 cents. 

When it is desired to construct a curve of data such as those in Table 29, 

THOUSANDS OF CARS 



HOURS PARKED 


Cliart 87. ParkiBg Time of Motor VeMcles in a Congested Area of Detroit, 1927. 
(Data from a chart in Facts aTvd Figures of the Automobile Industry^ 1928 Edition, p 84, 
National Automobile Chamber of Commerce, now the Automobile Manufacturers As- 
sociation.) 
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it is necessary to make adjustments because of the varying widths of class 
intervals. The class “50.0 and under 55.0 cents” is twice as wide as those 
which precede it. We do not know how many of the 1,437 wage earners 
received between 50.0 and 52.5 cents per hour, and how many from 52.5 
to 55.0 cents. We can say, however, that on the average there were 718.5 
wage earners in each of the two halves of the class “50.0 and under 55.0 

PER CENT 
UNEMPLOYED 
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AGE IN YEARS 

Chart 88. Per Cent of Employable Males Who Were Unemployed, by Age Groups, 
Michigan, January 1935. (Data from Monthly Labor Review, Vol. 43, No. 5, November 
1936, p. 1160 ) 

cents.’^ Adjustments of this type have been made in the last column of 
Table 29 and give us frequencies for each 2.5 cents of the series. These 
may be thought of as frequency densities per 2.5 cents of hourly earnings. 

The distribution may now be plotted in terms of the frequency densities, 
as in Chart 89. No estimate can well be made of the width of the last 
class interval of this distribution, and consequently no adjustment has 
been made in the table. Notice how attention was called to the presence 
of these 147 wage earners on the chart. An alternate method consists of 
putting a number or symbol at the end of the curve and showing the same 
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information in a footnote to the chart. One study which necessitated the 
rather frequent use of open-end classes included an appendix giving further 
details concerning the items falling in those classes.^ 


NUMBER OF MEN 
PER 2 5 CENTS 



A 
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AVERAGE HOURLY EARNJNGS IN CENTS 
B 

Chart 89. Frequency Densities of Hourly Earnings of 13,427 Wage Earners in Open- 
Hearth Furnaces, 1935. A. Column diagram; B. Frequency curve. (Data of Table 29 ) 

Comparison of frequency distributions. Table 30 shows two frequency 
distributions: one giving the distribution of weekly eaminp of male em- 

* W. A. Paton, Corporate Profits as Shawn by Audit Reports, Appendix B (pp. 120-124), 
National Bureau of Economic Research, New York, 1935. 
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NUMBER OF 
EMPLOYEES 



DOLLARS PER WEEK 

Chart 90. Weekly Earnings of Male Employees of Folding-Paper-Box Factories in 
the Northern and Southern Portions of the IJnited States, August 1935^ (Data of 
Table 30.) 

TABLE .30 


Weekly Earnings of Male Employees of Folding-Paper-Box Factories in the 
Northern and Southern Portions of the United States, August 1935 


Weekly earnings 

Number of employees 

Per cent of total 

North 

South 

North 

South 

Under! 4 

49 

4 

.9 

10 

$ 4 but under 8 

85 

6 

1.5 

1.4 

8 but under 12 

160 

43 

2.S 

10 3 

12 but under 16 

38.5 

178 

6.9 

42 6 

16 but under 20 

1,628 

88 

29.0 

211 

20 but under 24 

1,176 

28 

21.0 

6.7 

24 but under 28 

742 

34 

13.2 

8.1 

28 but under 32 

427 

12 

7.6 

2.9 

32 but under 36 

345 

11 

6.1 

2.6 

36 but tmder 40 

193 

6 

3.4 

1.4 

40 but under 44 

163 

4 

2.9 

1.0 

44 but under 48 

101 

3 

1.8 

.7 

48 and over 

162 

1 1 

2.9 

.2 

Total 

5,616 

418 1 

100.0 

100.0 


Source* United States Bureau of I.abor Statistics, Bulletin No t?20, Wages^ Hours, a7i4 Working Condt^ 
ti&ns in the Folding-Paper~Box Industry, 1333^ 19SIt^ and i935, j.p 29, 76, and SO 
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ployees in the folding-paper-box industry in the northern United States 
and the other presenting a distribution of weekly earnings of male em- 
ployees in the same industry in the southern United States. It will be 
observed that the first series refers to 5,616 males, while the second includes 
but 418. If the two series dealt with approximately the same total fre- 
quencies (that is, about the same number of men), we could merely plot 

PER CENT 



DOLLARS PER WEEK 


Chart 91. Percentage Distributions of Weekly Earnings of Male Employees of 
Foldmg-Paper-Box Factories in the Northern and Southern Portions of the United 
States, August 1935. (Data of Table 30.) 

two frequency curves on the same axes and study their outlines. The 
result of doing this for these two series is shown in Chart 90. The com- 
parison is not particularly illuminating, although it is obvious that the 
most prevalent earnings were between $12 and $16 per week in the South 
and between $16 and $20 per week in the North. Because of the wide 
difference in numbers included in the two series, we can make a more 
meaningful comparison of the two curves if we express the frequencies in 
each class as percentages of their respective totals. This has been done 
in the last two columns of Table 30 and the resulting percentage frequency 
distributions have been plotted in Chart 91. The veldtive importance of 
each earnings class is now set forth clearly. Both in the $8 and under $12 
and in the $12 and under $16 classes th^re was a larger proportion in the 
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South than in the North. In each class beyond $16 there were smaller 
proportions in the South than in the North. 

The comparison of the two series shown in Chart 91 was facilitated 
because the class intervals were the same in each series. If a series of 
$3 class intervals is being compared v/ith one having $6 intervals, pairs of 
classes of the first series may be combined, and the comparison thus made 
in terms of $6 intervals. Alternately, the frequencies (or percentage fre- 

'NUMBER. OF^ 

MIDSHIPMEN 



GRADE 


CUart 92. Cumulative Distribution of Grades of the 1937 Graduating Class of the 
United States Naval Academy, Showing Number of Midshipmen Receiving Less than 
Stated Grade. (Data of Table 31.) 

quencies) of the series of $6 class intervals could be distributed on the 
basis of $3 intervals. The frequency of each $6 class interval could be 
divided by 2, and this frequency assigned to each half of the interval. It 
would also be possible to use two vertical scales in inverse proportion to the 
class intervals. This scheme is not often used and may tend to mislead, 
as may the use of two (or more) vertical scales on any arithmetic chart. 

Sometimes, however, the class intervals are not multiples of each other 
as in the instance just mentioned. One series may have class intervals of 
$2, while another has intervals of S3. We can then make the areas undei 
the two curves the same by computing frequency densities; that is, by 
expressing the frequencies in each class in terms of frequencies per doUar. 
Thus the frequencies of each class of the jfirst series would be divided by 2, 
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while the frequencies of each class of the second series would be divided 
by 3. These new frequency density figures for a class interval really 
refer to each dollar width of that interval, they should, of course, be plotted 
at the mid- value of the class. If the number of items in the two series 
is appreciably different, we may compute percentage frequencies and ex- 
press these in terms of per dollar of class interval. 

When two frequency distributions are expressed in terms of different 
units (dollars, pounds, inches, etc.), a direct graphic comparison is not 
feasible since there is no simple way in which the Z-scales may be adjusted 

TABLE 31 

Cumulative Distribution op Grades of the 1937 Gradu- 
ating Class of the United States Naval Academy, 

Showing Number of Midshipmen Receiving Less 
THAN Stated Grade 


Grade* 

Number of 
midshipmen 

Per cent 
of total 

Less than 70 

4 

1.2 

Less than 72 

21 

6.4 

Less than 74 

60 

18.3 

Less than 76 

122 

37.3 

Less than 78 

180 

55.0 

Less than 80 

232 

70.9 

Less than 82 

267 

81.7 

Less than 84 

289 

88.4 

Less than 86 

307 

93 9 

Less than 88 

320 

97.9 

Less than 90 

324 

99.1 

Less than 92 

326 

99.7 

Less than 94 

327 

100.0 


As pointed out m Cit IX, the upper ’ 'T'* ^0 0^99 , 71 9499 

etc When rounded to whole percentage' i i i ‘ 70, 72, etc 


to each other. Certain computed values, to be discussed later, may be 
used to obtain effective numerical comparison. 

Cumulative frequency distributions and the ogive. The data of Table 
27 show the usual (non-cumulative) form of the frequency distribution and 
enable us to ascertain the number of midshipmen falling in each class. 
Sometimes, however, it may be useful to know how many or what propor- 
tion of students received less than certain stated grades, and this informa- 
tion may be seen clearly in a cumulative table such as Table 31. In this 
table the frequencies of Table 27 have been accumulated upon a “less than’’ 
basis; we may note, for example, that 232, or 70.9 per cent, had grades 
below 80, and that 60, or 18.3 per cent, had grades below 74. If the 
figures of a cumulative frequency distribution are shown by means of a 
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curve, the result is called an ogive. Either the absolute or relative figures 
may be plotted. Chart 92 shows the ogive of the data of Table 31. When 
we drew a curve of the non-cumulative series, the frequencies of each class 
were plotted in relation to the mid-value of the class. Now, however, 
our cumulative frequencies in each class refer to a single value and, since 4 
midshipmen had grades of less than 70, we plot the first point of the curve 
at 4 on the 7-axis and at 70 on the X-axis, and similarly for the other 
cumulative frequencies. 

Instead of wishing to know how many students received less than certain 
specified grades, we may wish to know how many (or what proportion) 
received given grades or above. The data of midshipmeii^s grades have 
been cumulated upon an ‘^or more” basis in Table 32. Now it may be 

TABLE 32 

Cumulative Distribution of Grades of the 1937 Gradu- 
ating Class of the United States Naval Academy, 

Showing Number of Midshipmen Receiving 
Stated Grade or Above 


Grade* 

Number of 
midshipmen 

Per cent 
of total 

68 or more 

1 327 

100 0 

70 or more 

323 

, 98.8 

72 or more 

306 

93.6 

74 or more 

I 267 

81.7 

76 or more 

205 

62.7 

78 or more 

147 

45.0 

80 or more 

95 

29.1 

82 or more 

60 

18.3 

81 or more i 

38 

11.6 

86 or more 

20 

6.1 

88 or more 

7 

21 

90 or more 

3 

.9 

92 or more 

1 

.3 


* As indicated in Ch IX, the lower limits are G7 9500 , 69 9500 . , 

etc When rounded to whole percentages, these become 68, 70, etc 


observed that 323, or 98.8 per cent, had grades of 70 or more; that 95, or 
29.1 per cent, had grades of 80 or more; etc. The ogive for this cumulative 
distribution is shown in Chart 93. Again the frequencies are plotted 
against the stated values. For the first group, 327 on the 7-axis is plotted 
against 68 on the X-axis; for the second group, 323 is plotted against 70; 
and so on. 

Cumulative frequency tables and ogives are often used to present data 
of wages and of hours of work. With reference to wages, they enable us 
to ascertain how many (or what proportion) of a group receive less than a 
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subsistence level, standard level, or comfort level. Similarly, we can as- 
certain the number or proportion receiving a subsistence level or more, 
a standard level or more, and a comfort level or more. It is also possible 
to ascertain what wage the lowest (or highest) paid 10, 25, 50, or other 
per cent of the workers are receiving. With respect to hours of work, we 

NUMBER. OF 
MIDSHIPMEN 



68 70 72 74 76 76 80 62 84 86 88 90 92 94 


GRADE 

Chart 93. Cumtilative Distribution of Grades of the 1937 Graduating Class of the 
United States Naval Academy, Showing Number of Midshipmen Receiving Stated 
Grade or Above. (Data of Table 32.) 

can see quickly the number or proportion working unusually long or short 
hours. 

Comparison of ogives. If two cumulative frequency distributions are 
based upon nearly the same number of items, their ogives may be plotted 
and compared in absolute terms. If, however, the two series are based 
upon different totals, it is essential that comparisons be based upon the 
percentage frequencies. Two ^^or more’’ ogives relative to hours of work 
in 1933 and 1935 in the paper-box industry are shown in Chart 94. Be- 
cause the 1935 ogive is steeper than that for 1933, it is apparent that a 
larger proportion of employees was working very long and very short hours 
in 1933 than in 1935, while a larger proportion was concentrated around an 
intermediate figure, the former NRA code level, in 1935. These conclu- 
sions would also be apparent if we examined two non-cumulative curves 
of the percentage frequencies, for they would differ in respect to dispersion. 
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The ogiveS; however, are unique in showing cumulative comparisons such 
as: (1) a larger proportion of employees worked 32 hours or more in 1935; 

(2) about the same proportion worked 44 hours or more in the two years; 

(3) a larger proportion worked 48 hours or more in 1933. 

The Lorenz curve. The 1935 Census of Manufactures enumerated 
2,427 establishments engaged primarily in the manufacture of ice cream. 
The second column of figures in Table 33 shows the number of establish- 
ments grouped according to the amount of ice cream produced by the firms 
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Chart 94. Cumulative Percentage Distributions of Weekly Hours of Labor of Em- 
ployees of Set-XJp Paper-Box Factories, 1933 and 1935, Showing Percentages Working 
Stated Number of Hours or More. (Based on data in Ilnited States Bureau of Labor 
Statistics, Wages, Hours, and Working Conditions in the SetrUp Paper~Box Industry, 
19SS, 1934-, and 1985, Bulletin No. 633, p. 24.) 

in each group while the third column of figures shows the amount of ice 
cream produced by all the firms in the group. The percentage figures are 
based upon these two columns. Notice that the 44 largest firms produced 
more ice cream than the 1,675 smallest firms comprising the first four 
groups. It may be seen also that the 474 small firms in the first two 
groups, amounting to 19.5 per cent of the total number of establishments, 
produced 1.7 per cent of the total output: whereas the 44 largest firms. 


CENT 
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constituting but 1.8 per cent of ali establishments, were responsible for 
29.7 per cent of the ice cream produced. This tendency of production 
to be concentrated in a few large firms may be represented graphically 
by a Lorenz curve. In order to construct a Lorenz curve, the two sets 

TABLE 34 

Estimated Distribution of Income op 39,458,300 Families and Single Individuals 
IN THE United States, July 1935~June 1936 




Number of 

Number per 

Number receiv- 

Income class 


families and 

$100 income 

mg lower hmit 


single 

(frequency 

of each class 



individuals 

densities) 

or more 

Under $ 

250 

2,123,534 


39,458,300 

$ 250 but under 

500 

4,587,377 

1,834,951 

2,308,784 

37,334,766 

500 but under 

750 

5,771,960 

32,747,389 

750 but under 

1,000 

5,876,078 

2,350,431 

26,975,429 

1,000 but under 

1,250 

4,990,995 

3,743,428 

1,996,398 

21,099,351 

1,250 but imder 

1,500 

1,497,371 

16,108,356 

1,500 but under 

1,750 

2,000 

2,889,904 

1,155,962 

12,364,928 

1,750 but under 

2,296,022 

918,409 

9,475,024 

2,000 but under 

2,250 

1,704,535 

681,814 

7,179,002 

2,250 but undei 

2,500 

1,254,076 

501,630 

1 5,474,467 

2,500 but imder 

3,000 

1,475,474 

295,095 

i 4,220,391 

3,000 but under 

3,500 

851,919 

170,384 

2,744,917 

1,892,998 

3,500 but under 

4,000 

502,159 

100,432 

4,000 but under 

4,500 

286,053 

57,211 

1,390,839 

4,500 but under 

5,000 

178,138 

35,628 

1,104,786 

5,000 but under 

7,500 

380,266 

15,211 

926,648 

7,500 but under 

10,000 

215,642 

8,626 

546,382 

10,000 but under 

15,000 

152,682 

3,054 

330,740 

15,000 but under 

20,000 

67,923 

1,358 

178,058 

20,000 but imder 

25,000 

39,825 

796 5 

110,135 

25,000 but under 

30,000 

25,583 

5117 

70,310 

30.000 but under 

40.000 but under 

40,000 j 

17,959 

179.6 

44,727 

50,000 i 

8,340 

83.4 

26,768 

50,000 but under 

100,000 

13,041 

26.1 

18,428 

100,000 but under 

250,000 ’ 

4,144 

2.76 

5,387 

250,000 but imder 

500,000 I 

916 

.366 

1,243 

500,000 but tmder 1,000,000 

240 

.048 

327 

1,000,000 and over 


87 

... 

87 

Total . ... 


39,458,300 




* No frequency density is shown for this class because the lower hmit of the class is not indioat<)d and 
thus the width of the class is not apparent 

Source* National Hesources Committee, Consumer Incomes %n the Umted States, their Dtstribuhon in 
19S6-Se, p 6 


of percentages are cumulated on a ‘dess than” basis as shown in the 
last two columns of Table 33. These sets of cumulative percentages 
are then plotted against each other as shown in Chart 95. The diagonal 
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line in the chart represents the line of even distiibution — a condition which 
could not obtain unless all establisiiments were equally productive. The 
diagonal serves, however, as a basis of reference, since the more the plotted 
curve departs from the diagonal the less uniform is the distribution of 
production among the establishments of different size. 

The Lorenz curve is not limited to showing the type of data in Table 
33. It is also useful for showing concentration of personal or corporate 
income. By drawing two or more Lorenz curves, we may compare income 

PER CENT OF 
PRODUCTION 



O 20 40 60 80 100 

PER CENT OF NUMBER OF ESTABLISHMENTS 

Chart 95. Lorenz Curve Showing Concentration of Production of Ice Cream, 1935. 

(Data of Table 33.) 

distributions at different times or places and, in the case of corporations, 
between different industries. To plot a Lorenz curve, we must, of course, 
know not only the frequencies for each class of the distribution but also 
the total amount for each class (as in the second and third columns of 
Table 33) . It should be noted that class intervals of equal width are not 
required for the Lorenz curve. 

The Pareto curve. In Table 34 are shown data of the distribution of 
incomes in the United States in 1935—1936 as estimated by the National 
Resources Committee. It will be observed, first, that the class intervals 
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are of varying size; second, that the number of cases falling in the various 
classes are very great for some classes but quite small for others; and third, 
that the distribution is decidedly skewed. 

To draw a graph of this distribution in the conventional manner is 

NUMBER 



Chart 96. Pareto Ciirve of Cixmulative Distribution of Annual Incomes of 37,334,766 
Families and Single Individuals in the United States Receiving $250 or More, July 
1935 — June 1936. (Data of Table 34.) 

virtually an impossibility. It v/ould be necessary to use frequency densh 
ties because of the varying class intervals, and these are given in the table. 
If the horizontal scale were 10 feet long to allow for values from 0 to $1,000,- 
000 only, then the horizontal distance for each of the first ten classes would 
be just a little less than ^ of an inch! Furthermore, if the vertical scale 
were about one foot high, the curve would be about f of an inch above 
the Z-axis after passing over about IJ- inch of the horizontal scale (at 
about $10,000). The reader should try to visualize this curve, as it is, of 
course, not feasible to include such a chart in a book of ordinary size. 

One device which will enable us to depict this sort of distribution 
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graphically is often designated as the Pareto curve. To draw a Pareto 
curve, the frequencies are first cumulated on an ‘^or more^^ basis, and then 
an ogive is plotted on logarithmic axes. Chart 96 shows a Pareto curve 
of the data of Table 34. As in the case of all ^^or more’’ ogives, the cumu- 
lative frequencies are plotted against the lower limits of the classes. Ap- 
parently it was because the major portion of such curves usually approxi- 
mated a straight line uhat Pareto enunciated his law of income distribu- 
tion.^ Although this law has been discredited as a rigid generality, the 

NUMBER PER 



Chart 97. Pareto Curve of Frequency Densities of Annual Incomes of 37,334,766 
Families and Single individuals in the United States, Receiving $250 to $1,000,000, 
July 1935— June 1936. (Data of Table 34.) 

chart is useful as a graphic device for plotting a frequency distribution 
which has marked positive skewness, or for comparing two or more such 
distributions. Although the double logarithmic grid is most frequently 
used for showing income distributions, it may be used for any frequency 

^ For a discussion see Income in the United States^ VoL I (1921), pp. 118-124, and 
VoL n (1922), Ch. 28, National Bureau of Economic Researcli, New York. 
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distribution which is so markedly skewed to the right that arithmetic 
plotting will not suffice. An important characteristic of such a cumulative 
curve is that the steeper the curve, the more nearly equal the distribution 
(that is, the less the dispersion). Two considerations in constructing such 
a chart as Chart 96 should be noted : first, it is not possible to show negative 
incomes; second, zero incomes cannot be shown. The curve of Chart 96, 
therefore, begins with the number of recipients of incomes of $250 or more. 

It is not necessary to cumulate the frequencies in order to plot them on 
the two logarithmic scales. The non-cumulative frequency densities may 
be plotted to give results as shown in Chart 97. 

We frequently encounter curves which are skewed to the right (although 
not so much as the one just considered). The non-cumulative data may 
be plotted against a logarithmic Z-axis and a natural F-axis. A series 
of this type is shown in Charts 125 and 126. The curve of Chart 126 
appears nearly symmetrical. In Chapter XI a logarithmic normal curve 
is fitted to the series. 
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CHAPTER IX 

MEASURES OF CENTRAL TENDENCl 


We have seen how to construct a frequency distribution and how to 
draw a frequency curve. From either the classified data or the chart 
it is obvious that there are certain values that are frequently present 
and others that occur less frequently. Most of the curves that we en- 
counter are of the type that is very roughly “bell-shaped/^ as shown in 
Charts 81, 83, 84, and 85. For such series as these charts represent, it is 
obvious that the more characteristic values are in the central part of the 
distributions. We therefore use the term measures of central tendency (or 
averages) to identify these values which may be computed in an attempt to 
characterize the frequency distribution. We shall discuss in this chapter 
the arithmetic mean, the median, the mode, and, briefly, the geometric 
mean and the harmonic mean. 

In the following chapter we shall consider measures of dispersion, which 
refer to the spread of a distribution; measures of skewness, which measure 
the direction and amount of asymmetry; and measures of kurtosis, which 
indicate the degree of “peakedness^^ of a series. 

The Arithmetic Mean 

The arithmetic mean from ungrouped data. The arithmetic mean is 
in such constant everyday use that nearly all of us are familiar with the 
concept. Sometimes we refer to the arithmetic mean merely as “the 
average^' or “the mean,^^ but we always use the appropriate adjective when 
we are speaking of the geometric mean, the harmonic mean, or some other 
less usual mean. 

The arithmetic mean of a series of items is obtained by adding the values 
of the items and dividing by the number of items. Suppose that in a cer- 
tain small city 1-pound loaves of fresh white bread are selling for lOizi, 
llji, and a loaf. The arithmetic mean of these four figures would be 
given by 

H + m + 11 ^ + m m 

-- ^ = == 10.250. 
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If we let Xi^X2j X 3 , etc., indicate the various values; N, the number of 
items; and X, the arithmetic mean, we have 

X = Xi + X2 + X3 Xjv 


Or, more briefly, using the summation symbol S, we may say 


The foregoing computation of the arithmetic mean involved no consid- 
eration of the fact that different amounts of bread may have been sold at 
the various prices. When an arithmetic mean is computed in this fashion, 
it may be referred to as a simple arithmetic mean. It is not correct to 
refer to this mean as an unweighted arithmetic mean since each of the 
prices was weighted equally. Let us proceed to compute a properly 
weighted arithmetic mean, considering the fact that there were sold 10,000 
loaves at 8 ^, 8,000 loaves at 4,000 loaves at IW, and 1,000 loaves at 
12i. We now have 


j ^ (10,000 X U) + (8,000 X 10^) + (4,000 X 11^) + (1,000 X 12^) 


^ 216 , 000 ^ ^ , 

23,000 


23,000 


If we use the symbols /i, / 2 , /s, etc , to indicate the numbers or frequencies 
associated with each value being averaged, we have 

V _ /lXl+/2X2+/3X3 + >- ^2fX ^XfX 
' / 1 +/ 2+/3 + --* 2/ X * 

Ordinarily an arithmetic mean is considered to be a weighted arithmetic 
mean, as just described, unless otherwise specifiied. 

It should be noted that, although the arithmetic mean price of bread is 
per loaf, no bread is actually sold at this exact price. The arithmetic 
mean must therefore be thought of as a computed value and not as a 
value which actually exists. 

Properties of the arithmetic mean. One important property of the 
arithmetic mean is that the algebraic sum of the deviations of the various 
values from the mean equals zero. This is important since it will enable 
us to develop a method for computing X which will save an appreciable 
amount of time when we are dealing with a frequency distribution. Let 
us consider a series of filve values, 6 , 8 , 9, 11, each one of which occurs 
but once. Then 


6 + 8 + 9 + 11 + 14 
5 


= ~ = 9 . 6 . 
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Now let us compute the deviation of each value from the arithmetic mean, 
— X, X2 - X2 — X, 0:3 == X3 — X, etc. We have 


X a; 

6 ~3 6 

8 ~1 6 
9 - 6 

11 +14 

14 +4.4 


It Will be observed that Sx - 0; this is always true for any series of values.^ 
If we compute the deviations d of the five items from some designated 
value which is not the arithmetic mean, the sum of these deviations Xd 
will not equal zero. If the designated value is less than the arithmetic 
mean, there will be too many positive deviations and the sum of the de- 
viations will be greater than zero. If the designated value is greater than 
the arithmetic mean, there will be too many negative deviations and the 
sum of the deviations will be a negative quantity. Since each of the five 
{N) items has been compared to a designated number which is not the 
true mean, the sum of the deviations will fail to equal zero by an amount 
which is exactly five {N) times the amount by which the designated value 
deviates from the actual arithmetic mean. It is therefore possible to 
designate some value as an assumed mean X^, to determine the deviations 
from this designated value, and, by adding (algebraically) the necessary 

correction to obtain the arithmetic mean.^ The process is illustrated 


in Table 35 where is taken as 9. Here it is observed that 'Ld — 
If we divide this figure by N, we see that Xd was too small by .6. 
is given by 


N 


+3 


= +. 6 . 


+3. 

This 


This is the correction to be added to the assumed mean; thus 
X = X<i+^ = 9+ | = 9.6, 

which agrees exactly with X computed by adding the values and dividing 
by 5. 


^ See Appendix B, section DC~1. If 2a; = 0, it is obvious that ^ — 0. +r is re* 


— 0 — 
N ~ ' N 

ferfed to as the ^'first moraent about the mean,” or merely as the ''first moment ” 


fche following chapter we shall have occasion to consider the second moment 

third moment -rr*. and the fourth moment 

N N 

2 See Appendix B, section IX-2. 


N 


In 
> the 
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TABLE 35 

Cal<?ulation of the ARirmiETic Mean, Z, 

BY Use of the Asstovied Mean, Xa - 9 

X d 

6 --3 

8 -1 

9 0 

11 4-2 

14 4-5 

+3 

In the foregoing illustration Xd was less than X. Suppose we choose 
Xd as 13. The computations are shown in Table 36. 


X,d - 4-3 

N 

= 0 + | = 9.6.' 
5 


TABLE 30 

Calculation op the Arithmetic Mean, Z, 


« -17 


z 

d 

6 

-7 

8 

-5 

9 

-4 

11 

-2 

14 

+1 


-17 


I + ^ 
N 


-13+- 


-17 


^ 9.6. 


In this case Xd was larger than X^ as is indicated by == —g— = —3.4. 

The result is, as before, X = 13 — 3.4 = 9.6. 

A second property of the arithmetic mean, which is of importance in 
" connection with later discussions, is that the sum of the squared deviatioUvS 
is less when the deviations are taken around X than when they are 
taken around any other value. 

The arithmetic mean from grouped data: long method. Table 37 shows 
the frequency distribution of the grades of midshipmen and it is desired 
to ascertain the value of X for the series. When dealing with a frequency 
distribution, w^e do not ordinarily have the original data from which the 
frequency distribution was made. When we do have the unclassified data 
(as in Table 25), we can obtain the value of the arithmetic mean most 
accurately by totaling the values and dividing by the number of items. 
When we have only the frequency distribution, we must ^ompute the 
mean from the grouped data. Let us proceed to compute X for the fre- 
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quency distribution of Table 37, and then compare our result with the 
arithmetic mean computed from the unclassified data. 


TABLE 37 

Calculation of Arithmetic Mean of Grouped Data by 
Long Method for Grades op the 1937 Graduating 
Class of the United States Naval Academy 


Grade 

Mid-values 
of classes 

Z 

Number of 
midshipmen 
/ 

fX 

68 0~69 9 

68 95 

4 

275 80 

70 0-71 9 

70 95 

17 

1,206 15 

72 0-73 9 

72 95 

39 

2,845 05 

74.0-75 9 

74 95 

62 

4,646 90 

76 0-77 9 

76 95 

58 

4,463 10 

78 0-79 9 

78 95 

52 

4,105.40 

80 0-81 9 

80 95 

35 

2,833 25 

82 0-83 9 

82 95 

22 

1,824.90 

84 0-85 9 

84 95 

18 

1,529 10 

86 0-87 9 

86.95 

13 

1,130.35 

88 0-89 9 

88 95 

4 

355 80 

90 0-91 9 

90 95 

2 

181,90 

92 0-93.9 

92 95 

1 

92 95 

Total 


327 

25,490 65 


- 25,490.65 

A “ 327 


77.95. 


In computing the arithmetic mean from a frequency distribution, we 
take the mid-value (sometimes called the class mark) of each class as rep- 
resentative of that class, multiply the various mid-values by their corre- 
sponding frequencies, total these products, and divide by the total num- 
ber of items. Symbolically, if Zi, X2, Z3 • • • represent the mid- values " 
and /i, /2, fz * • the fx^equencies, then 

X = d~/2Z2 +/ 3 Z 3 ___ S/X 

/i + /s + /a + • • • N 

The mid-value of a class is obtained by adding the upper and lower 
limits of the class and dividing by 2. For every frequency distribution 
we must consider carefully what those limits are. Considering the distri- 
bution of Table 37, we might take the limits of the third class as 72.0 and 
74,0, giving a mid-value of 73.0. This would be correct if the grades had 
each been rounded to the last completed tenth so that 72.0 included values 
ranging from exactly 72 to 72,099 * * ‘ , 72.1 included values from exactly 
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72.1 to 72.199 • • • , etc., instead of having been rounded to the nearest 
tenth, as was actually done. If rounding had been to the last completed 
tenth, the class should have been designated “72 and under 74.^’ Since 
we are dealing with a continuous variable, the limits of such a class would 
be 72 and 74, and the mid-value 73. For the midshipmen’s grades, round- 
ing was to the nearest tenth and the lowest value in the class “72.0-73 9” 
is 71.95, while the highest value is 73.9499 • * * . Thus, since the variable 
is continuous, the class limits are 71.95 and 73.95, and the mid-value is 
72.95. The mid-values have been entered in Table 37 according to this 
procedure. 

When a class is designated (for example) “32.00-33.99,” the mid-value 
is actually 32.995. Many statisticians would, however, state the mid- 
value as 33.00 since the relative discrepancy is small. In determining the 
mid-values for a frequency distribution it is important to know how the 
readings were rounded. When no information concerning the rounding is 
given in connection with the frequency distribution, it is probably best to 
assume that figures were rounded to the nearest unit given. For example, 
if a one-inch class is written “12.0-12.9 inches,” consider the limits as 
11.95 and 12.95 inches; if a five-pound class is written “10-14 pounds,” 
consider the limits as 9.5 and 14.5 pounds. However, for discrete data, 
a $2 class “$10.00-$11.99” has the limits $10.00 and $11.99, and a $10 
class “$70-$79” has the limits $70 and $79 if data were given only in 
whole dollars. A class should not be written “5 pounds but under 10 
poimds” unless we mean exactly what we say; namely, that items in this 
class do not fall below 5 pounds and do not equal 10 pounds. 

Considering the mid-values for the grades of midshipmen as discussed 

above and using the expression X = we find that the arithmetic 

mean is 77.95, as shown under Table 37. From the unclassified data of 
Table 25, let us compute the value of X to see how nearly the figure just 
obtained agrees with that value. If we total all of the individual grades 
and divide by 327, we have 

J = 77.94. 


The value obtained from the frequency distribution is in very close agree- 
ment with this; in fact, the error is only -h.013 of one per cent. This is 
an unusually close agreement, but we can generally count on a difference 
of not more than a few per cent at most. The value of the arithmetic 
mean computed from a frequency distribution will generally be in close 
agreement with the arithmetic mean from the unclassified data if the vari- 
able is continuous and the distribution is S 3 unmetrical. If (1) the distri- 
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bution is skewed or if (2) the variable is discrete (or if the data are broken), 
or if both (1) and (2) are true, the agreement will be less close. Likewise, 
close agreement cannot be expected if the data contain irregularities be- 
cause an unduly small sample was used 
Whatever lack of agreement is present is due to the inadequacy of the 
mid- value assumptions. It is almost ahvays true that none of the mid- 
values IS actually the true concentration 'point of its class. However, a glance 
at Chart 81 will suggest that for groups to the left of the group of maximum 
frequency the mid-value of a group is probably less than the mean of tha.t 
group, while for groups to the right of the group of maximum frequency 
the mid-value of a group probably exceeds the mean of that group Al- 
though all the mid-value assumptions are incorrect, there is a definite 
tendency for the errors to offset each other, provided the distribution is 
approximately symmetrical Since we have the unclassified data from 
which the frequency distribution was made, we can compute the mean for 
each class and compare the class means and class mid-values. This has 
been done in Table 38; it is observed that for the first 3 classes the mid- 
value of each class is less than the class mean, while for 7 of the last 9 
classes the mid-values exceed the class means. 

TABLE 38 

Comparison op Class Means and Class Mid- Valves for Grades op Midshipmen 


Grade 

Number of 
midshipmen 

Total of grades 
in each class 
ffrom Table 28) 

Class 

mean 

Class 

mid-value 

68 0-69 9 

4 

276 7 

69 18 

68 95 

70.0-71 9 

17 

1,212 5 

71 32 

70 95 

72.0-73 9 

39 

2,851 2 

73.11 

72 95 

74 0-75 9 

62 

4,641 3 1 

74 86 

74 95 

76.0-77.9 

58 

4,462 X 

76 93 

76 95 

78 0-79.9 

52 

4,103 4 

78 91 

78 95 

80.0-81.9 

35 

2,833 8 

80 97 

80 95 

82 0-83 9 

22 

1,821 5 

82 80 

82 95 

84 0-85.9 

18 

1,528 3 

84 91 

84 95 

86 0-87.9 

13 

1,124 4 

86.49 

86 95 

88.0-89.9 

4 

357 5 

89 38 

88 95 

90 0-91 9 

2 

181.2 

90 60 

1 90.95 

92.0-93 9 

1 

92.1 

92 1 

92 95 


The arithmetic mean from grouped data: short methods. In Tables 
35 and 36 (see al^ Appendix B, section IX-2), it was shown that we could 
assume a value Xd for the arithmetic mean and, making use of the fact 
that ^ 0, compute the necessary correction to obtain Z. This method 
will save us appreciable time in computing the mean from a frequency dis- 
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tribution. The expression for X is as before, except that the symbol / is 
introduced because of the frequencies in the various classes. Thus 

X = + 

The selected value for Xd may be the mid- value of any class. In Table 39, 
Xd has been taken as the mid-value of the sixth class and the computations 
below the table show that X = 77.95, the same as found by the longer 
method of Table 37. 

It will be observed that all of the classes of Table 39 are of the same 
width. When this is true, we may further shorten our computation of X 
by taking our deviations from Xd in terms of class intervals d'. Our cor- 

Yjfd' 

rection will then be in terms oi class intervals and must be multi- 
plied by the class interval i before being algebraically added to Xd. For 
the mean, then, 


will then be in terms oi class intervals and must be multi- 


X = Xd + 


{W 


The computation of X by this expression is shown in Table 40 and yields 
the same result as given in Tables 37 and 39. This method should always 


TABLE 39 

Calculation op Arithmetic Mean op Grouped Data by Short Method for 
Grades of the 1937 Graduating Class op the United States Naval Academy 


(Deviations in original units) 


Grade 

Number of 
midshipmen 

f 

Deviation from 
assumed mean 
d 

fd 

68.0-69 9 

4 

-10 

- 40 

70 0-71.9 

17 

- 8 

-136 

72 0-73.9 

39 

- 6 

-234 

74.0-75 9 

62 

- 4 

-248 

76.0-77.9 

58 

- 2 

-116 -774- 

78 0-79.9 

52 

0 

0 

SO 0-81.9 

35 

-1- 2 

+ 70 

S2.0-S3 9 

22 

+ 4 

+ 88 

84.0-85.9 

18 

+ 6 

+ 108 

86 0-87 9 

13 

+ s 

+ 104 

88.0-89.9 

4 

-i-10 

+ 40 

90 0-91.9 

2 

+12 

+ 24 

92 0-93.9 

1 

+14 

+ 14 +448 

Total 

327 


. . . -326 


^ llifd 326 

Y = f = 78.95 - ^ = 78.95 - 1.00 = 77.95. 
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be used when a frequency distribution is made up of equal class intervals. 
The greater the number of classes and the greater the number of items 
included in a frequency distribution; the more time is saved by this pro- 
cedure. The saving is especially important when the interval is some 
inconvenient number, such as 16.67, for example. 

The arithmetic mean from grouped data having unequal class intervals. 
The short methods which have just been described do not greatly facilitate 
our computations when we are dealing with a frequency distribution having 
classes of varying width. For frequency distributions of this type we 
may use the long method, multiplying each mid-value by the correspond- 
ing frequency, summing the products, and dividing by the number of 
items. When classes vary in width, the distribution is invariably skewed 
and we must remember that, as skewness increases, the errors in our mid- 
value assumptions offset each other less closely. Thus the mean com- 
puted from a frequency distribution having unequal class intervals may 
differ markedly from the mean computed from the unclassified data. Fur- 
thermore, as will be discussed at the end of this chapter, the arithmetic 
mean of a decidedly skewed distribution is seldom useful. 

When the arithmetic mean is computed for the frequency distribution 

TABLE 40 


Calculation of Arithmetic Mean of Grouped Data by Short Method for 
Grades of the 1937 Graduating Class of tee United States Naval Academy 

(Deviations m class intervals) 


Grade 

Number of 
midshipmen 
/ 

Deviation from 
assumed mean 
d' 

fd' 

68.0-69.9 

4 

-5 

- 20 

70.0-71 9 

17 


- 68 

72 0-73 9 

39 

-3 

-117 

74 0-75 9 

62 

-2 i 

-124 

76 0-77 9 

58 

-1 

- 68 -387 

78.0-79 9 

52 

0 

0 

80.0-81.9 

35 

+1 

+ 35 

82.0-83.9 

22 

+2 

+ 44 

84.0-85.9 

18 

+3 

4- 64 

86 0-87 9 

13 

+4 

+ 52 

88 0-89 9 

4 

+5 

+ 20 

90 0-91.9 

2 

+6 

+ 12 

92.0-93.9 

1 

+7 

+ 7 +224 

Total 

327 


-163 

_ _ , /163 

j* - 78.95 - ^227 

^ 2 = 78.96 - 1 00 = 77 96 
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of Table 41, the long method gives X = $3,009. It so happens that the 
imclassified data are available, and if we add the salaries paid to each of 
the 328 employees we obtain $965,780, which yields an arithmetic mean 
of $2,944. The mean computed from the frequency distribution is 2.2 
per cent too large because of the shortcomings of the mid-value assump- 
tions. It may be seen from Table 41, columns 2 and 6, that the mid-value 

TABLE 41 

Calculation op Arithmetic Mean of Grouped Bata Having Unequal Class 
Intervals and Comparison op Class Mid-Values and Class Means for 
Annual Salaries Paid to Employees op the Board of Governors 
OF the Federal Reserve System, 1935 


Salary 

(1) 

Mid-value 

X* 

(2) 

(3) 

fX 

(4) 

Total 
salaries in 
each class 

(5) 

Mean salary 
for each class 
[Col 5 ^ Col. 3] 
(6) 

$ 800-$ 1,099 

$ 950 

9 1 

$ 8,560 

S 8,700 

S 967 

1,100- 1,399 

1,250 

16 

20,000 

20,640 

1,290 

1,400- 1,699 

1,550 

89 

137,950 

137,920 

1,550 

1,700- 1,999 

1,850 

41 

75,850 

74,070 

1,807 

2^000- 2,499 

2,250 

40 

90,000 

84,500 

2,112 

2,500- 2,999 

2,750 

32 

88,000 

85,800 

2,681 

3,000- 3,499 

3,250 

22 

71,600 

69,500 

3,159 

3,500- 3,999 

3,750 

15 

56,250 

55,300 

3,687 

4,000- 4,999 

4,500 

19 

85,500 

84,450 

4,445 

5,000- 5,999 

5,500 

18 

99,000 

95,500 

5,306 

6,000- 6,999 

6,500 

6 

39,000 

36,600 

6,100 

7,000- 7,999 

7,500 

4 

30,000 

29,800 

7,450 

8,000- 8,999 

8,500 

4 

34,000 

33,000 

8,250 

9,000- 9,999 

9,500 

5 

47,500 

45,000 

9,000 

10,000- 15,999 

13,000 

1 8 

i 104,000 

105,000 

13,125 

Total 


1 328 

1 $987,100 

$965,780 



* strictly speaking, the mid-values are $949 50, $1,249.50, etc We disregard this slight difiereiice in 
thi- problem because it represents only i of one per cent of the narrowest class intervals 

Source Based on data given in Twenty-secoTid Annual Report of the Board of Governors of the Federal 
Re'icr^e System, pp 240-243 


~ _ $987,100 
^ “ 328 


$3,009. 


assumption is too small for the first two groups and too large for eleven 
of the last twelve groups. 

Sometimes a skewed distributionhas anindeterminate (or open-end) group 
at one end, and occasionally at both ends. For example, the last class in 
Table 41 might have been written ^'$10,(X)0 and over.^^ When such a class 
is present, there is no indication of the value which should be chosen as 
representative of the class. If it is assumed that the indeterminate group 
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has the same width as the preceding one, the mid-value will usually be 
too low. The use of such a mid-value may result in offsetting the upward 
bias of the preceding mid-values, but we can never be sure how much off- 
setting takes place or that it may not even overbalance the bias. The 
reason a class is left indeterminate is usually because it contains a few 
scattering items over a wide range of values. 

It should be emphasized that the value of the arithmetic mean com- 
puted for a skewed distribution of unequal class intervals is only a reason- 
ably good approximation. It becomes even less accurate when one or two 
indeterminate classes are present. The difficulty involved in the compu- 
tation of the mean for such a distribution is completely resolved if a foot- 
note is added to the table giving the total of the unclassified data. If this 
procedure is followed, a single division suffices to give the value of the 
arithmetic mean. 

Modified forms of the arithmetic mean. Instead of computing the 
arithmetic mean for all of a series of items, it may occasionally suffice to 
make an approximation by taking the average of the smallest and largest 
figures. The result of such a procedure will not differ greatly from the 
arithmetic mean if we are dealing with a continuous variable (or a discrete 
variable which does not show gaps) the distribution of which is symmetrical 
or nearly so. For example, meteorologists have found that it is not ordi- 
narily necessary to take hourly temperatures throughout a day and average 
these 24 readings to arrive at the daily mean temperature. It suffices to 
average only the maximum and minimum temperatures. These two read- 
ings may be obtained from the high and low points shown on the graph 
traced by a recordmg thermometer, or they may be had from a thermom- 
eter which automatically records the maximum temperature and another 
which automatically records the minimum temperature. 

It will be recalled that the data of midshipmen’s grades is skewed to the 
right. Consequently we should expect the average of the lowest and 
highest grades to exceed the arithmetic mean computed from all of the 
grades. Let us determine the average of these two extreme values and 
see how far it departs from X. The highest grade shown in Table 26 is 
92.1, while the lowest grade is 68.8. The average of these two grades is 
80.45. The value of X computed from the unclassified data was found 
to be 77.94. The discrepancy resulting from averaging the extremes is 
2.51, or 3.2 per cent, and indicates that we should not use this method as 
an approximation of X unless the distribution is symmetrical or nearly so 

A second modification of the arithmetic mean is one which will be re- 
ferred to again in connection with the measurement of seasonal movements 
(Chapter XVII). This modification consists essentially either of ignoring 
certain items on the basis that they are unusual extreme values, perhaps ' 
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resulting from the introduction of a non-homogeneous or non-comparable 
factor mto the situation, or of dropping one or more of the highest and 
lowest values in an array so that only the more typical values are averaged. 

Suppose that a runner has competed in the 1 00-yard dash in ten track 
meets during a season and that his times: were as follows. 

10 2, 10.1, 10.0, 10.0, 10.1, 10.0, 9.9, 10 1, 11 4, 10.2 seconds 

Now an arithmetic mean of these ten figures is 10 2 seconds, although only 
three races were run this slow or slower. In the race represented by 
the ninth figure above, the runner was spiked and limped in to finish 
an extremely poor last. The figure 11.4 does not indicate his running 
ability and could quite logically be excluded in arriving at a mean time 
which represents this runner’s ability. If we average the other nine 
figures, we obtain 10.07 seconds as the arithmetic mean for this runner 
under normal runmng conditions. In like fashion, if one race had been 
run with a strong wind at the runner’s back, his time would be abnormally 
short for the 100 yards and this figure too might be omitted.^ The pro- 
cedure just described differs from the one followed in measuring seasonal 
movements in that only the particular values for which a specific reason 
could be definitely assigned have been eliminated. When measuring 
seasonal movements, we shall drop one, two, or more items at both ends of 
an array in order to average the items which seem to cluster around some 
central value. 

Averaging percentages. It was pointed out in Chapter VII that a 
series of percentages based on different numbers should ordinarily be aver- 
aged by weighting each percentage in proportion to its base. There are 
conditions, however, under which we might -want to ignore the different 
bases and to average several percentages using a different system of weights. 
For example, let us assume that a student has taken two comprehensive 
examinations, each covering one-half of the subject matter of a course. 
Suppose that the first examination included 100 ^^true-false” questions, 
upon w^hich he made 82 per cent, while the second included 150 such 
questions, upon which he made 88 per cent. Since each percentage repre- 
sents a level of accomplishment for one-hsJf of the work of a term, a better 
description of the work of the student for the term would weight the two 
percentages equally, resulting in an average of 

82 + 88 _ 

^—.85 


3 A discussion of this type of modified mean when used in connection with time 
studies is given in F. E, Croxton and D. J. Cowden, Pradldcal Bttsmess Sia^uticsj pp. 
170^176, Prentice-HaU. Inc., New York, 1934. 



206 MEASURES OF CENTRAL TENDENCY (Chap. 5 

rather than weight the percentages according to the number of questions 
asked, giving 

(100 X 82) + (150 X 88) ^ 

250 " 

If the second examination had been based upon 10 ^^essay’^ questions, it is 
even more apparent that the weighting should not be determined by the 
number of questions included. 

Averaging averages. The general outlines of the problem of averaging 
averages are the same as those involved in averaging percentages. If we 
have several averages, each referring to a category, and wish to average 
these averages in order to arrive at a statement compatible with that 
referring to the total composed of these categories, it is necessary to weight 
each average according to the importance of its category. For example, 
consider the data of Table 42, colunan 3, which shows the average net 
earnings from current operations for nine groups of banks. If we add 
these nine averages and divide by nine, we obtain the figure $108,895. 
This figure is arrived at by giving each of the nine averages the same 
weight, although there were varying numbers of banks in each category. 
The total net earnings from current operations for all of the 7,379 banks 
was $53,916,889, and dividing the latter figure by the former gives $7,306.83, 
or $7,307, as the average net earnings from current operations. If each of 

TABLE 42 

Net Earnings from Current Operations During 1934 op Insured Commercial. 
Banks not Members of the Federal Reserve System 


Banks having deposits of 

(1) 

Number of 
banLs 

(2) 

Average net 
earmngs per 
bank from cur- 
rent operations 

(3) 

Approximate 
total net earnings 
from current 
operations 
[Col 2 X Col. 3] 
(4) 

$ 300,000 and under 

1,186 

$ 694 

$ 823,084 

100,001-$ 250,000 

2,492 

1,798 

4,480,616 

250,001- 500,000 

1,720 

3,563 

6,128,360 

500,001- 750,000 

641 

6,534 

4,188,294 

750,001- 1,000,000 

380 

8,727 

3,316,260 

1,000,001- 2,000,000 

585 

14,510 

8,488,350 

2,000,001- 5,000,000 

255 

33,520 

8,547,600 

5,000,001- 50,000,000 

116 

127,694 

14,812,504 

Over 50,000,000 

4 

783,011 

3,132,044 

Total 

7,379 


$53,917,112 


Source* Annual Report of the Federal Depont Xnswrance Corporalfion for the Year ErMng December Si 
t934f p. 57, and by correspondence. 
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the nine averages in Table 42 is multiplied by the number of banks to 
which it refers (see column 4), if these products are totaled giving $53,- 
917,112, and if this result is divided by 7,379, the average is $7,306.80^ 
or $7,307, which agrees very closely with that just given. 

As in the case of percentages, there may be some instances in w^hich 
the importance of each category is dependent upon some factor other than 
the number of items included in the category. Suppose that 12 tires have 
been run on a group of test trucks unloaded except for the driver, and 
have shown an average mileage of 13,618 miles. Suppose that 20 similar 
tires have been used on a similar group of test trucks each carr3dng the 
driver and 2,000 pounds of load, and have shown an average mileage of 
12,136 miles. The weighted average of mileage would be 


(12 X 13,618) + (20 X 12,136) 
32 


12,692 miles. 


What we have actually done is to assign H- = 1.67 times as much weight 
to the second average as to the first. Actually, trucks sometimes travel 
unloaded, sometimes loaded, sometimes partly loaded, and sometimes 
overloaded. If, for the purposes of illustration, w^e may assume that 
trucks in actual use travel -J- of their mileage unloaded and of their 
mileage loaded, we should arrive at our average by 


(1 X 13,618) + (4 X 12,136) 
5 


== 12,432 miles. 


It is the importance of the various load conditions in the use of the truck 
which should be considered in weighting rather than the number ot tires 
tested. If the truck travels empty f of its mileage and loaded | of its 
mileage, we should average 


(2 X 13,618) + (1 X 12,136) 
3 


13,124 miles. 


The Median 

The median from ungrouped data. The median is defined as that value 
which divides a distribution so that an equal number of items are on either 
side of it. If we have five items, $5, $6, $7, $8, $10, it is apparent that 
the value of the median is $7, since there are two items below that value 
and two items above it. If w^'e have six items, 2 inches, 5 inches, 6 inches, 
7 inches, 9 inches, 12 inches, it is clear that any value greater than 6 inches 
and less than 7 inches will satisfy our definition. As a matter of practice, 
when there are an even number of items, we usually take the value of the 
median as halfway between the two central items, In this instance the 
median would be 6,5 inch^ 
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From what has already been said, it is obvious that the median cannot 
readily be located unless the data have been put into an array or, as we 
shall see shortly, into a frequency distribution. It will be recalled that 
no arranging is necessary for computing the mean, since the items of a 
series may be totaled no matter what their order. 

The value of the median of a series may or may not coincide with the 
value of an existing item. When there are an odd number of items in an 
array, the value of the median coincides with that of one of the items; 
when there are an even number of items in an array, it does not. 

An important property of the median, which will be referred to again, 
is that it is influenced by the position of the items m the array but not 
by the size of the items. It has already been observed that the median 
of $5, $6, $7, $8, $10 is $7, The two larger items may have any values 
greater than $7 and the two smaller items may have any values smaller 
than $7, yet the median remains $7. 

Before proceeding to a consideration of the computation of the median 
for grouped data, let us compute the value of the median for the grades 
of the 327 midshipmen arrayed in Table 26. We want to find the value 
which is so located that 163 items wiU be on either side of it. This is, of 
course, the value of the 164th item,'^ and counting from either end reveals 
that the value of the median is 77.5. If we had an array of 200 items, we 
should find the value which divides the distribution so that 100 items fall 
below and 100 above it. This is obviously the mean of the 100th and 
101st items counted from either end of the array. 

The median from grouped data. To determine the value of the median 
of a frequency distribution, we count half of the frequencies from either 
end of the distribution in order to ascertain the value on either side of 
which haH of the frequencies fall. To determine the value of the median 

for the grades of the midshipmen (Table 40) we first compute ^ = 163.5. 

We then proceed to ascertain the value of the median. There are 122 
frequencies included in the first four classes of the distribution. The esti- 
mated value of the median is therefore obtained by interpolating 41.5 
frequencies (163.5 — 122) into the fifth class, assuming that the frequencies 


^ Por ungrouped data it may seem convenient to find the value of the median by 
N +1 

counting — 2 — items, beginning with the highest (or lowest) item in the array. This 

fN 4- 1\ 

is not the same as saying that the median is the I — 1th item. Although some 

persons hold this concept, it is not satisfactory. The concept of the middle item as 
the median is unsatisfactory when the array consists of an even number of items, and 
must be abandoned when the median is determined from grouped data. 
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in that class are evenly distributed within the class. The median, then, 
is given by the expression 

Med = 75.95 + ~ X 2 = 75.95 + 1.43 = 77.38. 

Oo 

Exactly the same result is obtained if we begin our computations from 
the other end of the distribution. There are 147 frequencies included in 
the last 8 classes and we proceed to interpolate 16.5 frequencies (163.5 ~ 
147) into the fifth class, from the upper limit toward the lower limit The 
result is 

Med - 77.95 - X 2 = 77.95 - .57 - 77.38. 

t>o 

The value of the median is, of course, the same whether we begin our 
computations from one end or the other. 

The value of 77.38 just obtained for the median from the frequenc}^ 
distribution is in very close agreement with that of 77.5 found from the 
array. Unless the data contain irregularities (accounted for by the small- 
ness of the sample), w^e can expect rather close agreement when dealing 
with a continuous variable, and likewise for a discrete variable if the data 
are not broken. 

We have now computed the values of the arithmetic mean and the 
median for the frequency distribution of midshipmen^s grades. The mean 
was 77.95. The median was 77 38. The mean exceeds the median be- 
cause the distribution is skewed to the right, on account of the presence 
of a few high grades not offset by correspondingly low grades. If a dis- 
tribution is exactly symmetrical, the mean and the median are identical. 
If a distribution is skewed to the left, the mean will be less than the 
median. This point will be treated more fully at the end of this chapter 
and in the following chapter. 

The computation of the median from a frequency distribution of unequal 
class intervals does not differ from that just described. Neither does the 
presence of indeterminate groups at either or both ends complicate the 
procedure. Referring to Table 41, we compute the median by iSrst de- 
N 

termining ~ = 164, the number of items on either side of the median. 

Since there are 155 frequencies included in the first 4 classes, we must 
interpolate into the fifth class of the interval This is 

Med = $2,000 + ^ X $500 = $2,000 + $112.50 = $2,112.50. 

This distribution shows much skewness to the right and, as is expected, the 
mean of $3,009 is much in excess of the median. As a matter of fact, we 
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shall see in the following chapter that one way of measuring skewness in- 
volves consideration of the difference between the mean and the median. 

If an ogive of a distribution is plotted, it is possible to obtain the value 
of the median graphically, as is shown in Chart 98. The process is the 
graphic equivalent of the computations already made and consists of the 

N 

following steps: (1) Compute ^ and locate this point on the vertical scale. 
(2) Draw a perpendicular to the F-axis at this point and extend the per^ 

NUMBER. OF 
MIDSHIPMEN 

350 

300 
250 

200 


150 
100 

50 
0 

68 70 72 74 76 78 80 82 84 86 88 90 92 94 

Ciiart 98. Graphic Location of Median for Grades of 1937 Graduating Class of the 
United States Naval Academy. (Data of Table 32.) 

pendicular to intersect the ogive. (3) At the intersection, drop a per- 
pendicular to the X-axis. The intersection gives the value of the median. 
From Chart 98 it is seen that, for the grades of midshipmen, the value of 
the median, located graphically, is 77.4, which is in close agreement with 
that computed arithmetically. 

The quartiles, quintiles, deciles, and percentiles. The median charac- 
terizes a series of values because of its midway position. There are several 
other measures of the frequency distribution which, taken individually, 
are not measures of central tendency but, as we shall see later, may be 
used to assist in measuring dispersion and skewness. They are, however, 
allied to the median in that they are based upon their position in a series. 
We shaE therefore digress at this point to discuss the quartiles, quintiles 
deciles, and 'percentiles. 
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There are three quartiles, which divide the distribution into four equal 
parts. Thus Qi (the first quartile, or the lower quartile) is the value 
located so that one-fourth of the items fall below it and three-fourths of 
the items exceed it. Q 2 is, of course, the median and is generally so desig- 
nated. Qs (the third quartile, or the upper quartile) is the value so located 
that three-fourths of the items fall below it and one-fourth exceed it. To 
determine the value of Qi for the data of midshipmen’s grades (Table 40), 
N 327 

we count ^ = 81.75 frequencies from the lower limit of the first 


Thus for the value of we have 
01 7Pv 

73.95 + X 2 
62 


74.65. 


The same result may be obtained by counting -j- from the upper limit of 
the last class. 

SiV* 

The value of the third quartile Qs may be computed by counting — 

from the lower limit of the first class or, more expeditiously, by counting 
N N 

^ from the upper limit of the last class. Since = 81.75, and since there 
are 60 frequencies in the last six classes, we have 


81.95 - ^11^ X 2 


80.71. 


There are four quintiles, which divide the distribution into five equal 
parts; nine deciles, which divide the distribution into ten equal parts; and 
ninety-nine percentiles, which divide the distribution into 100 equal parts. 
The procedure for computing these values is similar to that for the median 
and the quartiles. For example, we shall compute the value of the 3rd 


decile, which is also the 30th percentile. We count 


== 98.1 


from the lower limit of the first group and interpolate. Since there are 
60 frequencies in the first 3 groups, we have 

00 1 

73.95 -f- ^ X 2 - 75.18. 

02 


Unless a distribution is very extensive, there would be no purpose served 
in computing the percentiles. As a matter of fact, we generally use only 
the 10th, 20th, 30th, etc., percentiles, which are, of course, the 1st, 2nd^ 
3rd, etc., deciles. 

The terms qmrtilef quintile, decile, and 'percentile are sometimes used in 
a different sense, to refer to the part of the distribution in which an item 
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falls. Thus, if a student is said to be in the upper quartile of his class^ 
he is in the upper 25 per cent. If he is in the upper decile of his class, he 
is in the upper 10 per cent. It would undoubtedly lead to clarity of ex- 
pression if we reserved quartiles, quintiles, deciles, and percentiles to mean 
the measures discussed at the opemng of this section. To refer to the part 
of a distribution in which a student falls, we could say ‘^highest quarter’^ 
(above Qz), ^^second highest quarter^’ (between Q2 and Qz), ^^third highest 
quarter'^ (between Qi and Q 2 ), and ^‘^lowest quarter^^ (below Qi). Simi- 
larly, we could say ^^fifths’^ in place of quintiles, ^ ^tenths’’ instead of deciles, 
and “hundredths’^ or “percentages” instead of percentiles. 

The Mode 

The mode from imgrouped data. The mode of a distribution is the 
value at the point around which the items tend to be most heavily con- 
centiated. It may be regarded as the most typical of a series of values. 
For this very reason it is apparent that the occurrence of one or a few 
extremely high (or low) values has no effect upon the mode.^ If a series 
of data is unclassified, not having been either arrayed or put into a fre- 
quency distribution, the mode cannot be readily located. 

Taking first an extremely simple illustration : If seven men are receiving 
daily wages of $5, $6, $7, $7, $7, $8, $10, it is clear that the modal wage 
is $7 per day. If we have a series of values such as 

3, 5, 6, 7, 9, 10, 11 

it is apparent that there is no mode. 

The mode from grouped data. If we examine the array of midshipmen’s 
grades shown in Table 26, we find that it would be very difficult to deter- 
mine the value around which the items tend to concentrate. Perhaps that 
value is somewhere around 76 or 77, but in coming to such a conclusion 
we find ourselves counting the number of grades from 75.0-75.9, from 
76.0-76.9, from 77.0-77.9, and from 78.0-78.9. The mode may be located 
more readily by referring to a frequency distribution such as Table 40. 
Here it is clear that the modal group is 74.0-75.9; and if we take the mid- 
value as representative of the class, we should call 74.95 the mode. 

However, there is evidence here that the mid-value is not the best esti- 


^ This IS true m respect to the usual methods of locating the mode which are de- 
scribed here. If the mode is located by the expression 

Mode = ^ - q-- .. y& L. (fe . ±J) 

2(5/32 - e/Si - 9) 

the extreme values do have some slight influence. Hie computation of the i3*s is dis 
cussed in the foUowmg chapter 
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mate of the mode. Since there are 
fewer frequencies in the class preceding 
the modal class than there are in the 
class following the modal class, it is 
logical to expect that the actual con- 
centration is towai'd the upper limit 
of the class. We shall make use of 
the frequencies in these two adjacent 
classes to infer the probable concen- 
tration point within the modal class. 
The expression is 


Mo - Zi + 


Ai 

Ai + A 2 


X i, 


where h = the lower limit of the mod- 
al class; 

A 3 ~ the difference between the 
frequency of the modal 
class and the frequency 
of the preceding class 
(sign neglected) ; 

A 2 = the difference between 
the frequency of the 
modal class and the fre- 
quency of the following 
class (sign neglected) ; 
i = the interval of the modal 
class. 


For the frequency distribution of 
grades of the midshipmen, 

^2 ~39 


Mo = 73.95 4- 
== 73.95 + 


(62 - 39) 4- (62 - 58) 
23 


X 2 


23 4-4 


X 2 = 75.65. 


The interpolation which we have 
made may be illustrated graphically 
as shown in Chart 99. The method 
which we have described is some- 
times called the difference method to 
distinguish it from another procedure 
which is more usual but less satis- 



Chart 99. Diagrammatic Illustra- 
tion of the “Difference Method^* of In- 
terpolating for the Modal Value. Ai 
exerts an upward influence, and A 2 ex- 
erts a downward influence, each m pro- 
portion to its magnitude, so that the 
mode divides the interval of the modal 
class into two parts proportional to Ai 
and A 2 . That is, 

Mo — h _ ^ 

I 2 “ Mo As 


Geometrically, the mode may be lo- 
cated by dropping a vertical line from 
the intersection of the two diagonals as 
shown on the diagram. 

Algebraically the expression 


Mo 


— 4 


Ai + a/ 


may be developed as follow’-s: 

We wish to locate the mode so that 


Mo ~ h _ ^ 
I 2 Mo A 2 


A2M0 — A2Z1 ~ A1I2 AiMo, 
AiMo 4 A2M0 == AiZs 4* Ashj 
Mo(Ai 4" As) ~ Aih + AsZi. 
But ^3 = h 4 


Mo = 


Ailj 4“ Aji 4" Asj^i 
Ax 4 As 
Aili 4 Ash 


■4 


Al i 


Ax 4 As Ax 4 A- 


_ ; , 

A,+ 


A 


i 

2 
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factory.® In any event it should be realized that we are merely making 
an estimate of the value of the mode. Nevertheless, it is a useful estimate 
and it should be remembered that the mode has two important properties; 
first, that it represents the most typical value of the distribution and 
should coincide with existing items; second, that the mode (as usually com- 
puted) is not affected by the presence of extremely large or small items. 

Graphically we may obtain the mode from a column diagram as in 
Chart 99. We may make a very rough approximation of the mode by 
reading the value on the X-axis corresponding to the highest point of the 
curve or corresponding to the steepest portion of the ogive. The curve 
may be smoothed freehand since, unless the series has been subjected to a 
smoothing process, we should obtain a value about the same as the mid- 
value of the modal group. 

Upon occasion, series are encountered which have two modes and are 
referred to as bi-modal. Such a series is pictured in Chart 100. Some- 
times bimodality is the result of chance; sometimes it results because of 
the fact that two sets of non-homogeneous data are present. In Chart 100 
the two concentrations are attributable to the fact that some drivers were 
on full (or nearly full) time work, while others were working only one or 
two days a week. 

® The usual procedure considers the frequency of the two adjacent classes. Thus for 
Table 40 the 39 frequencies m the class preceding the modal class would be thought of 
as exerting a downward pull, while the 58 frequencies m the class following the modal 
class would be thought of as exerting an upward pull. Hence for the mode we should 
have 

Mo = 73.95 + g - g ^ g - g X 2 = 75.15. 

The objection to this method is that the interpolated value for the mode is held too 
closely to the middle of the class. To take a rather extreme case, suppose we had a 
distribution showing these central classes: 

Class } 

15.0- 19.9 30 

20.0- 24 9 70 

25 0-29 9 69 

30.0- 34.9 28 

Now it IS apparent that the mode is very close to 24 95 and that the difference method 
gives such a rosult: 

19.95 + X 5 - 24.83. 

The usual procedure, however, gives a figure which is clearly too low: 
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Characteristics of the Mean, Median, and Mode 

Before proceeding to a consideration of other measures of central tend- 
ency, we shall examine the characteristics of these three relatively simple 
and very important measures. 

Familiarity of the concept* The arithmetic mean is the most widely 
used of all the measures of central tendency. As will be pointed out later, 
it is frequently used under conditions which cause it to be misleading. 
Less well known than the arithmetie mean but very simple in concept is 
the idea of the median as the value which has an equal number of items 
on either side of it. Also less well known than the arithmetic mean, the 

NUMBER OF 
DRIVERS 



Chart 100. Distribution of Wages Received in Half Month by Drivers in Bituminous 
Coal Mines, Illinois, 1933. (United States Bureau of Labor Statistics, Wages and 
Hours of Labor in Bituminous-Coal Mining: 193S, Bulletin No. 601, p. 61.) 

concept of the mode as the most usual or typical of a group of items is 
probably the simplest of the three. 

The concepts of the three measures may be illustrated by means of the 
charts on page 216. The mean is at the point of balance, or center of 
gravity, such that XfX on one side of the mean equals S/X on the other 
side- The median divides the curve into two equal areas. The mode is 
the value below the peak of the curve. 

Algebraic treatment* The arithmetic mean may be treated algebra- 
ically: 

(a) Since X — it follows that, if any two of the three factors (the 
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total, the arithmetic mean, the number of items) are known, the third 
may be computed. Thus 


J = SX = Nl; 




2X 


Cb) Using appropriate weights, a 
series of arithmetic means may be 
averaged to yield the arithmetic mean 
of the combined distribution. 

The median does not lend itself to 
the type of algebraic treatment dis- 
cussed for the arithmetic mean. Al- 
gebraic treatment of the mode, similar 
to that sketched for the mean, is not 
possible. 

Need for classifying data. The ar- 
ithmetic mean may be computed from 
unclassified data, from arrayed data, 
from the frequency distribution, or (as 
noted above) merely from a knowledge 
of the total and the number of 
items N, When the arithmetic mean 
is computed from a frequency distri- 
bution, the value of X will very closely 
approximate the value of J for the un- 
classified data. The more nearly sym- 
metrical the distribution, the closer the 
agreement of these two values. 

In order that the value of the median 
may be computed, the data must be in 
an array (at least the central items 
must be arrayed) or in a frequency 
distribution. The median determined 
from the frequency distribution will 
agree approximately with that com- 
puted from the array if the distribu- 
tion of items is regular within the class 



The Arithmetic Mean Is at the Point of 
Balance or Center of Gravity. 



The Areas Are Equal on Either Side of 
the Median. 



The Mode Is Below the Peak of the 
Curve. 


containing the median. 

The mode is most readily located from the frequency distribution, and 
only with some difficulty from an array. King^^ has pointed out that an 
array of th e cities of the United States according to population of each 

BkmmU of StatisHml Method, p. 126, Macmillan, New York, 
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would show no mode. However, if such data were put into classes, a modal 
tendency might appear. It should be borne in mind that the process of 
interpolating for the modal value within the modal group is at best only 
an approximation. More refined methods of locating the mode involve 
essentially the smoothing of the data by formula and the determination 
of the X value of the maximum ordinate. 

Effect of unequal class intervals. YTien classes vary in width, the 
value of the arithmetic mean may be computed. Such a variation of class 
intervals is necessitated by the presence of marked skewness (almost in- 
variably to the right, or positive) resulting in a value for X which is not 
closely in agreement with that based on the unclassified data. The value 
of X from such a positively skew^ed frequency distribution would be ex- 
pected to exceed the value of X from the unclassified data. 

The median may ordinarily be determined rather satisfactorily from a 
frequency distribution haviiig varying class intervals. The upper quartile 
or one or more of the upper quintiles or deciles might, however, fall in a 
wide class having few frequencies. The necessary interpolation would in 
such a case be unreliable. 

When the class intervals of a frequency distribution vary in width, the 
mode may be satisfactorily located if the modal group and those on either 
side of it are of the same width. Othenvise the determination is apt to 
be of limited accuracy. 

Effect of classes with open end. The presence of one or tw^o indetermi- 
nate classes in a frequency distribution results in an inaccurate determina- 
tion of X, since mid-values ordinarily cannot be satisfactorily determined 
for such classes. 

The presence of indeterminate classes has no effect upon the determina- 
tion of the median. 

Indeterminate groups do not complicate the process of locating the 
modal value. Occasionally, as when working with a reverse or 

shaped distribution, the mode is at or near the end of a distribution. 
Under such conditions there w'ould not be any reason for having an inde- 
terminate group at that end of the distribution. Incidentally, in the case 
of reverse J-, and U-shaped distributions the mode is not a measure 
of central tendency. 

Effect of skewness. For a symmetrical distribution the mean, median, 
and mode are identical. If the symmetrical distribution is altered on one 
side of the mode so as to be skewed, there is no necessary change in the 
value of the mode (as usually computed), but the median is changed in 
the direction of the skewness. Thus positive skewness (skewness to the 
right) increases the value of the median. The mean is increased even 
more, since it is affected not only by the fact that there is an excess of 
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frequencies on one side of the mode, but also by the amount by which 
the various excess frequencies deviate from the mode. Although the dis- 
tribution of grades of the midshipmen is only slightly skewed, the effect 
of the presence of skewness is seen when we recall that the mode is 75 65, 
the median is 77.38, and the mean is 77.95. These values are shown on 
Chart 108 (p. 250). 

Experience with moderately skewed distributions of continuous data in- 
dicates that the median falls about f of the distance on the horizontal 
scale of a chart from the mode toward the mean (in the present instance 
the fraction is about f). Occasionally, therefore, the empirical formula 
suggested by Karl Pearson, 

Mo = Z ~~ 3(1 - Med), 

is used to obtain a rough estimate of the mode. For grades of the mid- 
shipmen, 

Mo - 77.95 - 3(77.95 ~ 77.38) = 76.24. 

This estimate varies considerably from the more accurate estimate of 
75.65, obtained earher by the A method. 

Effect of extreme values. When skewness is not general but is due to 
a few items deviating a great deal from the mode, the median will be only 
shghtly affected. The arithmetic mean, however, is affected by the value 
of every item in the series, and the presence of one or a few extremely 
large (or extremely small) items in a series may result in a mean which is 
very misleading. As ordinarily computed, the mode is not at all influenced 
by the presence of a few unusually high (or low) extreme values. 

The foregoing is of such great importance that we shall give further 
attention to it. Suppose we have the following series of seven values 

$12, $14, $15, $15, $16, $18, $19 

the mean of which is $15.57, the median $15, and the mode $15. If an 
extreme value of $25 is added to these seven, the arithmetic mean becomes 
$16.75, the median $15.50, while the mode remains $15. Now if, instead 
of having added $25 as the eighth item, we add $200, the mean becomes 
$38.62, but the median is still $15.50 and the mode $15. The effect upon 
the median of any value from $16 to oo is the same. The mode was not 
at all affected by the extreme value, although, if we had added a $16 item, 
it would have been affected. This illustrates a different point, also; 
namely, that the mode is not a useful measure unless it is based upon 
enough items to show a well-defined concentration. 

Because of the effect of extreme values upon the arithmetic mean, it is 
sometimes a misleading figure to use to describe a distribution. If we are 
considering the income of a group of people, and if most of them have 
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moderate incomes but one or a few have extremely high (or low) incomes, 
the mean will reflect these extremes and to that extent will be atypical 
rather than typical. An alumni association recently made a study of the 
graduates of 20 years ago. Among other questions asked was one con- 
cerning income during 1936. More than 350 questionnaires were sent out ; 
only 133 replies were received concerning income of 1936. There is a large 
probability that these replies were selective and any figures derived there- 
from would be of doubtful value. The mean income of the 133 replying 
was $13,958, but this high average was due to the fact that there were 
several very large incomes which were definitely extreme values. The 
median income was $7,500, while the mode was very close to $5,000. In 
such a case as this we should not use the mean alone to describe the dis- 
tribution. If only one figure is to be used it is better to use the median 
or mode, depending upon which concept is of more importance, that of 
the value which has an equal number of items on either side of it or that 
of the most usual. It would be much better, of course, to give all three 
values, and, if possible, a frequency distribution or a frequency curve. 

Sometimes in dealing with a series in w'hich suspected heterogeneity is 
present, it may be advisable to use the median in lieu of the mean. For 
example, measurements might have been taken of the weight of a number 
of goldfish and the figures may reveal the presence of several xmusually 
large specimens. It is suspected that, because of ignorance or carelessness, 
the enumerator included a few carp with the goldfish. The questionable 
values could be discarded. However, we are not mre that the heavy fish 
were carp, and perhaps their measurements should not be discarded. The 
use of the median allows the extreme values to be represented by their 
position in the series rather than by their size. 

Sometimes we have a series in which there are present extremes of which 
we know the number but not the individual values. In such a situation 
we can determine the median or the mode, but not the mean. 

When we have a series of values extending over a great range, any con- 
cept of a measure of central tendency is dubious. Suppose we have the 
values 4, 6, 2,000, and 2,100. It is obvious that a mean or a median 
could be computed but that neither would have any practical meaniag. 

Effect of irregularily of data. When data are broken or irregular, the 
value of the mean computed from a frequency distribution may be de 
cidedly different from the value based on the unorganized data. 

The same is true in the case of the median if gaps occur among the items 
falling in the class containing the median. When gaps occur in the vicinity 
of the median, the median is not a particularly good concept to use as its 
value would be rratic if one or two items were added to or subtracted 
from the series. 
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If a mode is clearly defined, there are not likely to be gaps near that 
value. When gaps are present near the mode, it is quite likely that there 
are too few items in the series for the mode to be either clearly defined or 
meaningful. 

Reliability when based on samples. In Chapter XII we shall discuss 
the variation which may be expected in values of the mean when based 
on repeated random samples. At this point it will suffice to remark that 
the mean is more reliable than the median, and the median more rehable 
than the mode. 

Mathematical properties. The arithmetic mean has two important 
properties: first, So; = 0; and second, = a minimum. Because of this 
latter property the mean is the usual basis of reference for measures of 
dispersion. The mean is an important function in many processes which 
will follow in later sections of this book. Among other uses it is essential 
for fitting the normal curve to observed data. 

The sum of the deviations from the median (signs neglected) is a mini- 
mum. For this reason certain measures of dispersion are sometimes based 
upon the median. 

Selection of appropriate measure. Using the foregoing measures as 
descriptive devices, the statistician may be faced with the problem of de- 
ciding which one to use to characterize a given set of data. In general, 
the measure of central tendency that he should use depends upon (1) the 
nature of the distribution of the data and (2) the concept of central tend- 
ency which is desired for a particular purpose. 

If the distribution is symmetrical, or approximately so, the three meas- 
ures may be used almost interchangeably. If a series is skewed, we must 
bear in mind that the mean is frequently not a typical value and that it 
may be better to use the mode (which is typical) or perhaps the median. 
When there are extreme deviations or when there is suspected hetero- 
geneity, we may use the median in place of the mean, or recourse may be 
had to a modified mean. 

If X is computed, use may be made of that value to obtain a total 
Thus, if adults average 150 pounds in weight, it is safe to load about 20 
people in an elevator rated to carry 3,000 pounds. (The figure of 150 
pounds is somewhat high for the average weight of adults, but it is the 
figure frequently used io compute elevator capacity. It is obvious that 
the 20 people leferred to should not ail be heavy persons.) If subsequent 
computations are to be made, the mean may be required. If a curve is 
to be fitted to a frequency distribution, the mean will probably be used. 
F one series of data is eventually to be compared with another in respect 
to dispersion, the mean may be needed This, however, does not mean 
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that the median or the mode should not be used for describing either or 
both of the series. 

The relative standing of a person in a class may be indicated by stating 
whether he is better than half of the members. This rating involves the 
use of the median. Other statements referring to various proportions of 
the students may be made by using quartiles, quintiles, deciles^ or per- 
centiles. 

If we are interested in knowing the typical annual expenditure of motor- 
ists for gasoline, we should make use of the mode. 

Since the three measures embody different concepts, it may sometimes 
be advisable to use two or possibly all three. The use of the mean and 
the mode, or the mean and the median, gives us an idea of the amount of 
skewness present, as will be shown in the next chapter. 

Sometimes it is necessary to make a quick estimate of the central tend- 
ency of a series. Under such conditions the mode may be promptly esti- 
mated from a frequency distribution, and the median may be quickly 
approximated from either an array or a frequency distribution. Of course, 
if the total and the number of items are given, the arithmetic mean niay 
be computed in a few seconds. 

Minor Means 

The arithmetic mean, median, and mode are frequently thought of as 
the more important measures of central tendency, because of their wide 
usefulness, simphcity, and general applicability. Under certain conditions 
other measures of central tendency may be useful, and we shall therefore 
consider the geometric mean and the harmonic mean. As pointed out 
earlier, the term ^^mean^^ is frequently used to designate the arithmetic 
mean; consequently, when referring to any other mean such as the geo- 
metric mean or the harmonic mean, we should always refer to the measure 
by its complete designation. 

The geometric mean. The geometric mean is defined as “the iYth root 
of the product of the items.” Thus, for the four items 5, 8, 10, 12, tke 
geometric mean is 

G -= ^5 X 8 X 10 X 12 = -^4800 - 8.3. 

It is interesting to note that the arithmetic mean of these four items is 
8.75. For any series of positive values (not aU the same), the geometric 
mean is smaller than the arithmetic mean.® When one of the values equals 
zero, the geometric mean equals zero and is therefore inappropriate. If 

s For a demoustratioa, see Appendix B, section IX-3- 
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one or more values are negative, the geometric mean can sometimes be com- 
puted but may be meaningless. These are important drawbacks to its use. 

S5rmbolically, the geometric mean is -y/Xi X X 2 X XzX • • • X X^. 
The computation is usually carried out by means of logarithms thus 

, ^ log Xi + log Z 2 + log Xs d h log Xn _ ^ log X 

log G = 

Tlie logarithm of the geometric mean is thus the arithmetic mean of the 
logarithms of the values. 

When frequencies are present, each logarithm must be multiplied by the 
corresponding frequency. Thus 

1 ^ /i log Xi 4 “ /2 log X2 + /a log Xz + • * ■ _ S/ log X 

log Or- ^ ^ . 

For a frequency distribution, the geometric mean is usually computed by: 
( 1 ) ascertaining the logarithm of the mid-value of each class, ( 2 ) multi- 
plying each logarithmic mid-value by its proper frequency, (3) summing 
these products, (4) dividing by the number of items, and (5) taking the 
anti-logarithm of the result. The procedure is illustrated in Table 43; 
here it is found that G = 77.825, a value lower than the arithmetic mean, 
which is 77.95. If a series is symmetrical in a logarithmic sense (see 
Chapter XI) and the items are evenly distributed within the classes geo- 
metrically instead of arithmetically, it is preferable to use the mid-values 
of the logarithms of the class limits rather than the logarithms of the mid- 
values of the classes. If raw data are available, it is, of course, advisable 
to make the class intervals logarithmically equal also. 

It will be recalled that the arithmetic mean is the sum of the values 
divided by the number, while the geometric mean is the iVth root of the 
product of the values. As noted before, N times X gives SX. For the 
geometric mean, == Xi • X 2 • X 3 • etc.; that is, the geometric mean 
raised to the Nth. power equals the product of the values. This leads to 
the rather interesting point that any series of numbers having the same 
N and the same SX have the same arithmetic mean (for example, 1 and 
11 , 2 and 10, 4 and 8 , 5 and 7,-2 and 14 all have an arithmetic mean of 6 ) , 
and that any series of numbers having the same N and the same 'product 
have the same geometric mean (for example, 1 and 36, 2 and 18, 4 and 9 
all have the geometric mean of 6 ). 

Another property of the geometric mean is that the pinduct of the ratios 
of the values on one side of the geometric mean to the geometric mean is 
equal to the product of the ratios of the geometric mean to the values on 
the other side of the geometric mean. To illustrate, let us take the values 
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4 , 5 , 20 , 25 , the geometric mean of which is v^lOOOO = 10. The ratios of 
the values 4 and 5 to the geometric mean are and while the ratios 


TABLE 43 

Calculation op Geometric Mean for Grades of the 1937 Graduating Class op 
THE United States Naval Academy 


Grade 

Mid-values 
of classes , 
1 X 

logX 

/ 

/logX 

68.0-69.9 

68 95 

1 838534 

4 

7.354136 

70 0-71 9 ! 

70 95 

1 850952 

17 

31.466184 

72 0-73 9 

72 95 

1 863025 

39 

72 657975 

74 0-75 9 

74 95 

1 874772 

62 

116 235864 

76 0-77 9 

76 95 

1 886209 

58 

109.400122 

rS 0-79.9 

78 95 

1 897352 

52 

98 662304 

80 0-81 9 

80 95 

1 908217 

35 

66 787595 

82 0-83 9 

82 95 

1 918816 

22 

42.213952 

84 0-85 9 

84 95 

1 929163 

18 

34 724934 

86 0-87 9 

86.95 

1 939270 

13 

25 210510 

88.0-89.9 

88 95 

1 949146 

4 

7 796584 

90.0-91 9 

90 95 

1 958803 

2 

3 917606 

92.0-93 9 

92.95 

1 968249 

1 

1 968249 

Total 



327 

618 396015 


logG = 


618 396015 
327 


- 1.891119, 


G = 77 825. 


of the geometric mean to the values 20 and 25 are ^ and Thus we 
have 


± ^ ^10 10 

10 ‘ 10 20 ‘ 25’ 

i = l 

5 5* 

Similarly, we may reverse the ratios to write 

W W ^ 

4*5 10 * lO’ 

5-5. 

The following paragraphs discuss certain instances in which the geo- 
metric mean is useful. 
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(1) The geometric mean may be used for averaging ratios, 
the following data: 


Ratio of Ratio of 

^ Native~bom Foreign-bom foretgn-born to native-horn to 

Community inhabitants inhabitants native-born foreign-horn 

(per cent) (per cent) 

A 8,000 4,000 50 200 

E 1,500 3,000 200 SO 


The arithmetic mean of the two ratios of foreign-born to native-born 
population is 125 per cent. Likewise, the arithmetic mean of the two 
ratios of native-born to foreign-born population is 125 per cent! These 
two averages are mconsistent with each other. This incongruous result 
does not occur if we use the geometric mean, for the geometric mean of 
each of the two pairs of ratios is V 50 2 00 = 100 per cent We could, of 

course, total or average the foreign-born inhabitants for the two com- 
munities, and total or average the native-born inhabitants, thus obtaining 
two ratios which are consistent. There are 7,000 foreign-born and 9,500 
native-born inhabitants, or an average of 3,500 foreign-born and 4,750 
native-born inhabitants. The ratio of foreign-born to native-born is 


7,000 3,500 

9,500 4,750 


73.7 per cent, 


and the ratio of native-born to foreign-born is 


9,500 4,750 

7,000 3,500 


135.7 per cent. 


The product of these two ratios is 1. This arithmetic method, however, 
does not assign equal weight to the two ratios. Observe that the arith- 
metic method involves the ratio of the means (or totals), whereas the 
geometric procedure involves the geometric mean of the ratios. We have 
here two different concepts. Which one to use in a given situation depends 
upon the purpose. If we wish to establish a typical ratio for a number of 
communities and wish that ratio to be independent of the number of 
native-born or foreign-born persons present in the various places (that is, 
we wish to assign equal weight to each ratio), we may use the geometric 
mean of the ratios. If we wish to allow the populations to exert an influ- 
ence, we may detemaine the ratio of the totals or means. The question is 
not whether to use an arithmetic or a geometric mean of the ratios, but 
whether to use a ratio based on arithmetic means (or totals) or a geometric 
mean of ratios. 

If the two ratios of foreign-bom to native-born are averaged arithmetic- 
ally but weighted according to the native-born populations, the result is 
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73.7 per cent. If the two ratios of native-born to foreign-bom are aver- 
aged arithmetically but weighted according to the foreign-born population, 
we obtain 135.7 per cent. These figures, of course, agree with those ob- 
tained by taking the ratios of the totals. 

The geometric mean may be used when we wish to assign equal weight 
to equal ratios of change. Suppose (a) that two commodities are selling 
at $2 and $10 per unit; (b) that at a later date the first commodity doubles 
in price while the second one is halved in price, and thus they sell for $4 
and $5 respectively; and (c) that at a still later date the original price of 
the first commodity is halved and becomes $1, while that of the second 
commodity is doubled and becomes $20. The arithmetic mean under 
these three situations yields: (a) $6; (b) $4 50; and (c) $10.50 The geo- 
metric mean gives: (a) $4.47; (b) $4.47; and (cj $4.47. The assumption 
used to justify the geometric mean is illustrated by saying that a doubling 
in price offsets a halving in price, a quadrupling in price offsets a price of 
one-fourth the original figure, and similarly for any other two ratios whose 
product is 1. This characteristic will be referred to again concerning a 
possible use of the geometric mean in connection with price index numbers. 

(2) Sometimes a frequency distribution is encountered w^hich is mark- 
edly skewed to the right. If, instead of plotting the mid-values of the 
classes, we use the logarithms of the mid-values (or better, plot the logar- 
ithmic mid-values, the geometric mean of each pair of limits, on a logar- 
ithmic X-scale) and a symmetrical distribution results, a geometric analysis 
may be proper. This is discussed more fully in Chapter XL 

(3) Probably the most frequently used application of the geometric 
principle has to do with the determination of average rates of change. If 
a city had a population of 100,000 in 1920 and 120,000 in 1930, what was 
the average annual rate of change? The rate of change was 20 per cent 
over the entire period. If we take one-tenth® of that figure, or 2 per cent, 
as the annual rate and compute a 2 per cent increase each year over the 
preceding year, our 1930 population comes out as 121,900! Obviously 
the correct figure is slightly smaller than 2 per cent, since we are actually 
compounding. We may compute the average annual rate of change by 
using 

Pn = Pa (1 + r)“, 

where Pa == population at beginning of period; 

Pn = population at end of period; 
r — rate of increase (or decrease) per year, expressed as a decimal : 
n = number of years. 

® The 1920 census was taken as of January 1, 1920, while the 1930 census was taken 
as of April 1, 1930, We should, therefore, actually consider lOi rather than 10 years 
as the period between censuses. 
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For the data above 

120,000 = 100,000 (1 + 

Solving this by the use of logarithms gives 

5.079181 = 5.000000 + 10 log (1 + r) 

log (1 + r) 


.079181 


10 

= .0079181 
1 + r = 1.0184 

r = 1.84 per cent. 


The expression Pn = Po (1 + rY is sometimes termed the compound 
interest formula because of its usefulness in various problems involving 
compound interest. We have used it above to determine average annual 
rate of growth.^^ Knowing values of any three of the four symbols shown, 
we can solve for the fourth. Thus we may determine: 


(a) Average annual rate of change r. 

(b) Population a given number of years later Pm assuming a constant 

rate. 

(c) Number of years n until a given population will be attamed, again 

assuming a constant rate. 

(d) Population a given number of years earlier P 03 if the rate was 

constant. 

It should be noted that’the assumption of a constant rate of change for 
population is not valid over extended periods for any except possibly '^new^' 
countries. 

The harmonic mean. The harmonic mean H is the reciprocal of the 
arithmetic mean of the reciprocals of the values. The expression is 




Zi ^ Z2 ^ Z3 ^ ^ 


N 



For purposes of computation, it is more convenient to use the form 


H 


N 


Xi X2 ^ X3 ^ ^ x^ 



In the above discussion we foiind the average rate of growth between two selected 
points. Sometimes we wish to find the average rate of growth which best describes a 
number of values for different years. Such an average is not dependent upon only 
the first and last values of a serii^ and is therefore more hkeiy to be a representative 
figure. A method of fitting a curve to obtain such an average is ^ven in Chapter XTI. 
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1 _ Xi X 2 ^ Zs Xat 

H N 



The harmonic mean of the two values 3 and 12 is 


J 

H 



^ _5 
24 ^* 
H = 4.8. 


For these same values, fche arithmetic mean is 7.5, while the geometric 
mean is \/3 X 12 = 6. For any series of values (not all the same or not 
including zero as one value), the harmonic mean is smaller than either 
the geometric or the arithmetic mean.^^ 

The harmonic mean is so rarely computed for a frequency distribution 
that we shall merely note the procedure, which consists of multiplying 
the reciprocal of each mid-value (or mid-value of the reciprocals of the 
class limits) by its frequency, adding these products, dividing by iV, and 
taking the reciprocal of the result. 

While the harmonic mean is not a measure of great importance, it is 
often confusing and hence we shall give a somewhat extended explanation 
and indicate several possible applications. 

Application (1 ) , Although oranges are not usually priced in this fashion, 
let us suppose that two grades of oranges are selling at 10 for $1 and 20 
for $1. The arithmetic mean may be computed as 


X = 


10 + 20 
2 


= 15. 


That is, 15 for $1, or $.067 per orange. This is the price we must pay per 
orange if we spend equal amouTits of money for each grade. Paying $.067 
for each of 30 oranges, we shall spend $2.00 for the lot. 

The harmonic mean gives a different result: 


H = - 


A ^ Jl 
10 ^ 20 



That is, 13|- for $1, or $.075 per orange. This is the price we must pay 
per orange if equal nurrd)eTS of oranges are bought at eobch price. Thus, if 


See Appendix B. section IX-4.. 
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we buy 15 oranges at 10 for $1 and 15 oranges at 20 for $1, we shall spend 
$2.25 for all 30. Similarly, if we buy 30 oranges at $.075 each, we shall 
spend $2.25 for the lot. 

The harmonic mean will give the same results as the arithmetic mean 
if we weight by the quantities bought at each price. Thus 
30 

jyT = 15 oranges per $1, or $.067 per orange, 

'<ro) + %o) 

assuming equal amounts of money spent for each grade. 

If prices sre quoted m the usual way, as so much per dozen, these oranges 
are selling at $1.20 per dozen and $.60 per dozen. A simple arithmetic 
mean results: 


^ — $1*20 + $^0 _ ^ dozen, or $.075 per orange. 

It is the same as the first harmonic mean, since we are assuming in our 
computation that equal quantities are to be bought at each price. (Iden- 
tical results are obtained if the quotations are per orange instead of per 
dozen oranges.) On the other hand, if we consider that 10 oranges may 
be bought at $1.20 per dozen and 20 oranges may be bought at $.60 per 
dozen, we have 


^ __ ($1.20 X 10) + ($.60 X 20) 
^ 30 


$.80 per dozen, or $.067 per orange. 



If the assumption is: 

If prices are quoted m 
terms of: 

Equal amounts of money 
spent for each grade or com- 
modity 

Equal number of units of each 
i grade or commodity bought 
at each price 

Price per unit 

1. X, weighted by quanti- 
ties for equal amounts 
of money (in this case 
' units per dollar) 

I Xj weighted by number 
of units (or equally 


2 H, weighted by dollars 
(or equally) 

n. J7, weighted by dollars 
for equal numbers of 
units (or price per 
umt) 

Units per dollar 

3, X, weighted by dollars 
(or equally) 

Hf X, weighted by dollars 
for equal numbers or 
units (or price per 
unit) 


4. H, weighted by quanti- 
ties for equal amounts 
of money (in this case 
units per dollar) 

IV. H, weighted by number 
of units (or equally) 
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This result is the same as obtained in our first and third calculations, since 
we have assumed that equal amounts of money are to be spent for each 
grade of orange. 

In the above illustrations the harmonic mean has furmshed no informa- 
tion not already available by use of the arithmetic mean. The harmonic 
mean may be useful, however, when data are customarily or conveniently 
given in terms of problems solved per minute, miles covered per hour, 
units purchased per dollar, etc. 

The arithmetic mean and the harmonic mean give consistent results if 
proper consideration is given to (a) how the data are quoted and (b) what 
weights are to be used. Taking prices as an illustration, the table on page 
228 sets forth the relationships. Expressions 1, 2, 3, 4 give results con- 
sistent with each other Similarly, expressions I, II, III, IV give consist- 
ent results. 


Consider commodity A as selling at 4 umts for $1, or $.25 each, and 
commodity B as selling at 10 units for $1, or $.10 each. 

If equal amounts of money are to be spent for each commodity: 


1. 

I = 

2. 

H = 

3. 

X = 

4. 

H = 

If equal 
price: 

LX- 

n. 

H = 

m. 

J = 

IV. 

H = 


14 


14 

2 1.00 


if + ifl.) X 7 

V25/ ^ VIO/ .50 


$.1429 per unit, or 7 for I 


for $1, or $.1429 per unit. 


JY == ^ = 7 for $1, or $.1429 per unit. 




2 

.35 


= ^ = $.175 per unit, or 5.71 for $1. 


.35 


•Ki) + -Hro) 


.35 


= $.175 per unit, or 5.71 for $1. 


.35 


ii) + i 




“ ^ ^ 5 for $1 or $.175 per unit. 

M 14 

40 
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From what has just been said it may be observed that (for either as- 
sumption), when averaging fractions (ratios) by the arithmetic or harmonic 
method, we use the arithmetic mean if weights are in the same terms as 
the denominator, the harmonic mean if weights are in the same terms as 
the numerator. Of course, if weights are in the same terms as the numer- 
ator, they may be converted into terms of the denominator and the 
arithmetic mean employed. 

Suppose that a transaction consists of 40 handkerchiefs sold at 10 for 
$1 and 60 handkerchiefs sold at 20 for $1. Now we are not interested in 
either of the assumptions mentioned above. What we desire is the mean 
price when 40 handkerchiefs sell at 10 for $1 and 60 sell at 20 for $L 
Using the quotations as given (that is, in terms of number of units pei 
dollar) we may use the harmonic mean with quantity weights. Thus 


H - 



100 - 
-y = 14f per « 


L, or $.07 each. 


Still using the quotations in terms of units per dollar, we may obtain the 
same result by employing the arithmetic mean, if our weights are amounts 
of money spent for each grade. Thus 


X = (10 X 4) + (20 X 3) ^ m 
7 7 


14f per $1, or $.07 each. 


If we shift our quotations to price per unit, we have 40 handkerchiefs sold 
at $.10 each and 60 sold at $.05 each. Now, using the harmonic mean, 
we weight by amoimts of money spent for each grade. Thus 


H = 



7 

= ^ ~ $.07 each, or 14f per $1. 

lo 


Finally, using the arithmetic mean of prices per unit and weighting by 
quantities sold, we have 


T == (-10 X 40) + (.05 X 60) _ 

100 100 


$.07 each, or 14|- per $1. 


Application (B). Occasionally a frequency distribution may be encoun- 
tered which is so skewed to the right that, when plotted in terms of the 
reciprocals of the class mid-values, it assumes an approximately normal 
form. In such instances harmonic treatment may be indicated. Such 
cases are rather unusual, however, and will not be treated in this book. 
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Application (3). An interesting and apparently valid application of the 
harmonic mean is given in an article by Holbrook Working In his 
study of the factors influencing the 


price of potatoes, Working uses the 
harmonic mean, because, as he points 
out, a low price during part of a sea- 
son will be compensated only by a 
disproportionally high price during 
the remainder of the season. To il- 
lustrate we have selected the monthly 
prices for one crop year and have 
shown them in Chart 101. When the 
reciprocals or the logarithms are 
plotted, the curve is straighter than 
when the arithmetic values are 
plotted, the reciprocals giving per- 
haps the most nearly straight line. 
This indicates that the harmonic 
mean is not inappropriate as a meas- 
ure of the average price of potatoes 
during a season. 

It is sometimes argued that the 
geometric mean should be used for 
series of data having a definite lower 
limit and an indefinite upper limit. 
One type of such data is price rela- 
tives, which, having a base of 100, 
may fall to 0 but rise to ex. The 
question is not so much one of the 


i»n«cc m CENTS 



Chart 101. Price of Potatoes per 
Bushel in Minneapolis and St. Paul, 
September 1919-May 1920: A. 
Price, B. Logarithm of Price, C. 
Reciprocal of Price. (Data from 
Holbrook Working, p. 40.) 


existence of such limits as it is one of what values may actually occur and how 
they are approached — arithmetically, geometrically, or reciprocally — 
whether, if we are dealing with a frequency distribution, the series is ap- 


proximately symmetrical in terms of Z, skewed but approximately sym- 
metrical in terms of log X, or skewed but approximately normal in terms 



Holbrook Working, Factors Determining the Price of Potatoes in St Paid and 
Minneapolis^ Technical Bulletin 10, University of Minnesota Agricultural Experiment 
Station, pp 9 and 10. 

For a further discussion of the harmonic mean, see R. W. Burgess, Introduction to 
thr Mathematics of StaiistieSf pp. 90-96, Houghton Mifl3in Co., Boston, 1927. 
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In an arithmetic sense, a price drop of 33.3 per cent is offset by a price 
rise of 33.3 per cent, a decline of 50 per cent is offset by a rise of 50 per 
cent, and a fall of 90 per cent is offset by a rise of 90 per cent. Thus 


66.7 + 133.3 
2 

50 + 150 
2 

10 + 190 
2 


100 , 

100 , 

100 . 


In a geometric sense, a price drop of 33,3 per cent is offset by a rise of 
50 per cent, a fall of 50 per cent is offset by a rise of 100 per cent, and a 
drop of 90 per cent is offset by a rise of 900 per cent. Thus 

V66.7 X = 100, 

V50 X 200 = 100, 

VlO X 1000 = 100. 

In a reciprocal sense, a price drop of 33.3 per cent is offset by a rise of 
100 per cent, a fall of 50 per cent is offset by a rise to oo , and a fall of 
more than 50 per cent cannot be offset by any rise however great. Thus 

66.7 200 


1 - 1-1 
50 + 00 


= 100 . 


There are a number of other measures of central tendency which are 
of mathematical and theoretical rather than of practical interest. The 
quadratic mean 

V 

is the square root of the arithmetic mean of the squares of the values 
Unless all the values are the same, the quadratic mean exceeds the arith- 
metic mean. The quadratic mean is mentioned here because the concept 
is important. Although we do not use the term ^^quadratic^^ or “mean,” 
we shall shortly compute the quadratic mean of the deviations from the 
arithmetic mean. It will not be a measure of central tendency, but a 
measure of dispersion; we shall caU it the standard deviation, or cr, and 
its expression is 
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CHAPTER X 

DISPERSION, SKEWNESS, AND KURTOSIS 



In the preceding chapter we considered certain measures wMch at- 
tempted to describe the central tendency of a frequency distribution. 

There are other aspects of frequency 

distributions which are also impor- 

tant. First we shall consider the dis- 
I \ B persiouj or spread of the data. Two 

counties may each show an average 
y yield of wheat of 15 bushels to the 

/f \ acre; but, if the data are considered 

/ farm, one county may exhibit 

y extreme values ranging from 10 to 20 

^ rr. ^ bushels per acre, while the other may 

Having Different Dispersions. show yields as low as 5 bushels per 

acre and as high as 25 bushels per 
acre. If such a crude measure of dispersion may be used, it is apparent 
that there is greater uniformity of yield in the first county. Chart 102 
shows two symmetrical curves which have the same mean but which differ 
in respect to dispersion. 

If a frequency curve or frequency distribution is not symmetrical, it is 
said to be skewed, or asymmetrical. Most frequency distributions exhibit 



Chaart 103. A Curve Skewed to the Right (Solid Line) and a Symmetrical Curve 

(Dashed Line). 
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more or less skewness. Chart 103 shows two curves, one of which is 
symmetrical and one of which is skewed. The skewed curve is skewed to 
the right — the direction in which the excess tail appears. 

A measure of kurtosis indicates the degree to which a curve of a fre- 
quency distribution is peaked or flat-topped. Our basis of reference i» 
the normal curve described and dis- 
cussed in Chapter XI. Chart 104 
shows a normal curve and a curve 
which is more peaked than normal; 

Chart 105 shows a normal curve and 
a flat-topped curve. 

Measures of Absolute Dispersion 

The mean annual temperature at 
Boise, Idaho, is 50.9 degrees. The 
mean annual temperature at Seattle, 

Washington, is 51.0 degrees or almost 
exactly the same as at Boise. These 
two figures do not, however, suffice to 
characterize this aspect of the climatic 
conditions of the two cities. The temperature at Boise has been known 
to fall as low as —28 degrees and to rise as high as 121 degrees. In 
Seattle the lowest recorded temperature is 3 degrees and the highest is 
98 degrees. It is quite apparent that there is greater variability of tem- 
perature at Boise than at Seattle. 

Let us consider a second illustration. 
A buyer for a large department store 
has been offered two types of electric 
lights for use in the store. The sales- 
men each claim about the same aver- 
age length of life for their bulbs. The 
buyer obtains from a testing labora- 
tory test data for 40-watt lamps of the 
two makes and finds that the average 




Chart 104. A Peaked or Lepto- 
kmrtic Curve (Solid Line) and a 
Normal or Mesokurtic Curve (Dashed 
Line). 


Chart 105. A Flat-Topped or Platy- 
kurtic Curve (Solid Line) and a Normal 
or Mesokurtic Curve (Dashed Line). 


life of each of the two kinds of bulbs is 
about 1,000 hours. Examining the 
data further, however, shows that in 


one batch of bulbs a lamp burned out at 325 hours while one lasted 1,570 
hours. In the other batch one lamp lasted but 105 hours, while one did 


not bum out until the expiration of 2,910 hours. This limited information 
indicates a greater degree of uniformity among lamps of the first batch 
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The range. The measurement of dispersion may be made in a crude 
form by referring to the lowest and the highest values, as was done in the 
preceding paragraphs. This is a very simple and easy-to-understand 
measure. The range gives a comprehensive value for the data in that it 
includes the hmits within which all of the items occurred. However, the 
range has one important disadvantage. Because the range is based upon 
the two extreme items of a series, it is misleading if either one (or both) 
of those extremes is an unusual occurrence. The range is occasionally 
used for continuous data when we have a large number of items, but 
should be avoided when we are handling broken data. Another objection 
to the use of the range as a measure of dispersion arises when we are com- 
paring two distributions with different total frequencies. The series which 

TABLE 44 

Calculation op the Aveeage Deviation foe Grades of the 1937 Graduating 
Class of the United States Naval Academy 
(X - 77.95) 


Grade 

Mid-values 

of 

classes 

X 

Number 

of 

midshipmen 

/ 

Deviation of 
mid-values 
from X 
|a;| = X - X 
(signs neglected) 

/i^i 

68 0-69 9 

68 95 

4 

9 00 

36 

70 0-71 9 

70 95 

17 

7 00 

119 

72 0-73 9 

72.95 

39 

5.00 

195 

74 0-75,9 

74 95 

62 

3 00 

186 

76 0-77.9 

76.95 

58 

100 

58 

78 0-79 9 

78.95 

52 

100 

52 

80 0-81 9 

80.95 

35 

3 00 

105 

82.0-83 9 

82.95 

22 

5,00 

110 

84 0-85 9 

84 95 

18 

7 00 

126 

86 0-87 9 

86.95 

13 

9.00 

117 

88 0-89.9 

88.95 

4 

1100 

44 

90 0-91 9 

90,95 

2 

13 00 

26 

92 0-93.9 

92 96 

1 

15 00 

15 

Total 


327 


1,189 


AD * 


N 


1,189 

327 


= 3 64. 


includes the larger number of items is more likely to include some very 
low or high values or both; therefore the range is apt to increase as N 
increases. 

Referring to the midshipmen^s grades in Table 44, it is observed that 
the range is 67.95 (the lower limit of the first class) to 93.95 (the upper 
limit of the last class). If we have the array to refer to as in Table 26. 
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the range may be given a little more accurately as 68.75 to 92.15. The 
range from the frequency distribution merely tells us that no one in the 
1937 class received a grade below 67.95 or above 93.95. The range is 
usually stated as the difference between the two extreme values. For the 
midshipmen, 93.95 — 67.95 = 26.00. However, if only this single figure 
is given, we do not know whether the range is from 0 to 26, or from 70 
to 96, or what the limits may be. 

The 10-90 percentile range. Sometimes we are interested in knowing 
the range within which a certain proportion of the items fall. One such 
range, which is frequently used in educational measuiements, is the 10-90 
percentile range. This measure excludes the lowest 10 per cent and the 
highest 10 per cent, giving the two values between which the central 80 
per cent of the items occur. Of course, the 10th percentile is the 1st 
decile, and the 90th percentile is the 9th decile. The measure is usually 
referred to, however, as the 10-90 percentile range rather than the 1-9 
decile range, since the former carries more clearly the idea of the central 
80 per cent. 

N 

For the midshipmen’s grades, we interpolate from the lower li mi t 
the series to obtain Pio: 

Pio = 71.95 + ~ X 2 = 72.55; 

N 

and Yq from the upper limit of the series to obtain P 90 *. 

19 7 

P90 = 85.95 - X 2 - 84.54. 

The 10-90 percentile range is thus 72.55 to 84.54, or 11.99; and we know 
that 10 per cent had grades below 72.55, 10 per cent had grades above 
84.54, while 80 per cent had grades between these values. 

The quartile deviation. In Chapter IX mention was made of Qi and ^ 3 , 
the lower and the upper quartiles. A measure of dispersion based upon 
these values is termed the quartile deviation, or the semi-interquartile range. 
It is given by 

n ~ Q3 Qi 
^ “ 2 * 

If a series is symmetrical, it is clear that Qi and Qz are equidistant from 
the median. Therefore, if we measure =i=Q from the median, we include 
50 per cent of the items of the series, for we have measured back to Qi 
and Qz. If a series is skewed, as is usually true, we may take =^Q around 
the median, and, while we shall not arrive at either Qi oi Qzt we may 
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expect to include approximately 50 per cent of the items unless the skew- 
ness is great. 

In the preceding chapter it was found, for the midshipmen^s grades, that 
Qi = 74.65; Med = 77.38; Qs = 80.71. 

Computing the quartile deviation gives 

g . . 3,03, 


We therefore expect to find about half of the midshipmen^s grades within 
77.38 =t= 3.03, or between 74.35 and 80.41. Let us interpolate into the 
proper classes of Table 44 to see if this is true. First we want to know 
about how many of the 62 cadets in the fourth class had grades from 
74.35 to 75.95. That is given by 


75.95 - 74.35 ,, _ 1.60 

2.00 ^ 2.00 


X 62 - 49.6. 


We include all cadets in the next two classes 58 + 52 == 110, and for the 
seventh class 


.46 

2.00 


X 35 - 8. 


The number included between 74.35 and 80.41 is therefore 


49.6 + 110 + 8 = 167.6, or 51.3 per cent. 

Neither the 10-90 percentile range nor the quartile deviation is affected 
by extreme values as is the range. However, these two measures have 
another shortcoming, in that they do not consider all of the values in 
measuring dispersion. The values below Qi (or above Qa) could be massed 
closely together or spread out widely; the effect upon Q would be the same. 

Occasionally, use is made of the inter-quartile range, Qs-Qv For the 
midshipmen^s grades, 

Qs Qi - 80.71 - 74.65 = 6.06. 


The average deviation. The average deviation^ or the mean deviation as 
it is sometimes called, is usually measured in relation to the arithmetic 
mean. The average deviation is obtained by taking the sum of the de- 
viations of the items from the arithmetic mean, without regard to signs, 
and dividing by the number of items. It will be recalled that 'Lx = 0, 
and it is for this reason that the signs of the various x values are neglected. 
Thus 


AD - 


N ’ 


AD 


N ’ 


or for a frequency distribution 
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where [ | means tha.t the signs are neglected. Table 44 indicates the 
computation^ of AD for grades of the xnidshipmen, and AD is found to be 
3.64. Observe that this value is a little larger than Q, which was found 
to be 3.03. 

If a distribution is normal, 57.5 per cent of the items are included within 
the range of X =t= AD. If the distribution is moderately skewed, this will 
be found to be approximately true. For the midshipmen’s grades, 

X ± AD = 77 95 3.64 = 74.31 and 81.59, 

which is a shghtly wider range of values than was found for the median =•= Q. 

Let us interpolate into the frequency distribution to see what percentage 
of the midshipmen fall within the limits of 74.31 and 81.59. The procedure 
is similar to that employed in the preceding section on Q, First we esti- 
mate the number of frequencies from 74.31 to 75.95 in the fourth class of 
Table 44. Thus 


1.64 

2.00 


X 62 = 51. 


Our range of values includes all of the frequencies of the fifth and sixth 
classes, 58 + 52 = 110. Finally, we estimate the number of frequencies 
from 79.95 to 81.59 in the seventh class. This is 


1.64 

2.00 


X 35 == 29. 


Combining, 51 + 110 + 29 = 190, or 58.1 per cent of all the midshipmen. 
This distribution, it will be remembered, is slightly skewed (see Chart 81 
or 108), The reader may wmnder why we do not refer to the array of 
Table 26 rather than to the frequency distribution to ascertain the number 
of grades between 74 31 and 81.59. We could do that, but to be consistent 
we should also have computed the mean deviation from the data of Table 
26 or 25. To do this, we use the mean of the unclassified data (77.94) 
and the deviations of each grade from that mean. 

Because the sum of the deviations (signs neglected) is a minimum when 
taken around the median, the mean deviation is sometimes computed in 
relation to the median. In practice, however, the mean is generally used 
and, if the series is symmetrical, the resulting AD is the same. 


1 The computation of AD shown in Table 44 may be abbreviated somewhat by the 
use of a short method, which is described in R. E. Chaddock, Principles and Methods 
of Statisticsj pp. 156-158, Houghton MiSiin Co., Boston, 1925 Because of the rather 
infrequent use of AD the procedure is not given m this text The reader is cautioned 
that, to use this short method, the assumed mean must be taken as the mid-value of 
the group m which the mean occurs, or, when the mean falls between two groups (as 
for the midshipmen’s grades), as the mid-value of either of those groups. 
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The standard deviation, ungrouped data. Instead of merely neglecting 
the signs of the deviations from the arithmetic mean, we may square the 
deviations, thereby making aU of them positive. Thus we may have a 
measure 


r2 --S 




the variance or mean square deviation. (At a later point we shall use the 
term vanation to refer to is also known as the second moment, 

T2, of the distribution, since the deviations have been raised to the second 
power. We shall make use of the variance in later sections of the book. 

At this point we are interested in the square root of this measure, 

Vir’ 

TABLE 45 

Computation of Standard Deviation by Long Method 
FOR Scores of 15 Persons in Recalling Trade 
Names of Advertised Products 


Subject 

Score X 

X 


1 

12 

-20 87 

435.56 

2 

1 21 

-1187 

140 90 

3 

21 

-1187 

140.90 

4 

1 23 

- 9 87 

97.42 

5 

27 

- 5.87 

34 46 

6 

28 

- 4.87 

23 72 

7 

30 

- 2 87 

8 24 

8 

34 

1.13 

1.28 

9 

37 

413 

17.06 

10 

39 ; 

6.13 

37.58 

11 

39 

6 13 

37.58 

12 

39 

613 

37.58 

13 

40 

7 13 

50.84 

14 

49 

16.13 

260.18 

15 

54 

21.13 

446 48 

Total 

493 

1 


1,769 78 


Source: S- M Newhall and M H Heim, “Memory Value of Absolute 
Size in Magazine Advertising,” Joximal of Apnhed Psychology, Vol 13, 
1929, pp 62~75 The above data were for advertisements of 150 square 
inches each, and each was observed for 5 seconds Maximum phssible 
score was 81 


X 


_ m 

15 


32 87. 



1,769 78 
15 


cr 


\/n7M = lO.S 
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which is termed the standard deviation or, occasionally, the root-mean- 
square deviation. It has been pointed out previously that is a mini- 
mum when taken around the arithmetic mean and that the standard de- 
viation is always computed in reference to the arithmetic mean. As the 
above expression indicates, the steps involved in computing cr are: 

(1) Determine the deviation x of each item from X. 

(2) Square these deviations. 

(3) Total them. 

(4) Divide this sum by N, 

(5) Take the square root. 

The computation of cr for a series of ungrouped data is shown in Table 45. 
This procedure involves the computation of x for every item, and would 
be a rather laborious procedure if there were an appreciably larger number 


TABLE 46 

Computation op Standard Deviation by Short 
Method fob Scores of 15 Persons in Recalling 
Trade Names of Advertised Products 


Subject 

Score X 

A2 

1 

12 

144 

2 

21 

441 

3 

21 

441 

4 

23 

529 

5 

27 

729 

6 

28 

784 

7 

; so 

900 

8 

34 

1,156 

9 

37 

1,369 

10 

39 

1,521 

11 

39 

1,521 

12 

39 

1,521 

13 

40 

1,600 

14 

49 

2,401 

15 

54 

2,916 

Total 

493 

17,973 

Source Same as Table 45 


-m-i 

Il7,973 Imy? 

15 Vl5/ 



= V 1,198.20 - 1,080.22 = Vll7.98 


= 10.9. 
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of items. The value of cr may be obtained, without computing each x, by 
means of the expression^ 



Referring to Table 45, it will be observed that the value of X was rounded 
to two decimals and thus each value of x and x^ is an approximation. If 
X and X are shown to sufficient digits, results by the two methods will be 
the same. Here, both methods yield 10.9. 

The computation of <r by the short method is illustrated in Table 46. 


Notice that the correction 



is subtracted. 


This is always true. 


The sum of the squared deviations is least when taken around X. We, 
however, took our deviations around some other value (0 in this instance) 
and these squared deviations are therefore too large. 

The standard deviation, grouped data. Before considering the prop- 
erties of O', let us see how to compute a for a frequency distribution. 
Since frequencies are present 



where x now represents the deviation of a class mid-value from the mean. 
Table 47 illustrates the computation of o’ for the midshipmen’s grades. 
It is fairly obvious that this method, involving the determination of a 
number of x values is cumbersome. It is very unusual for the x values 
to be integers as in Table 47. In this case it is because X happened to be 
a class limit. Consequently our illustration in Table 47 does not show 
the disadvantages of this method as clearly as might be desired. If the 
column of x values had two decimals, the cumbersome nature of this pro- 
cedure would be clearer. 

A short method for cr is available which allows us to take the mid-value 
of any class as the assumed mean, work with deviations around this value, 
and make the necessary correction. The expression is 



To further shorten the process the deviations are taken in terms of classes. 
The expression^ is 



2 For proof of this expression, see Appendix B, section X-l. 
® For demonstration, see Appendix B, section X-1. 
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where d' indicates the deviation of a class mid-value from the assumed 
mean in terms of classes and i is the class interval. It is of interest to note 


that the correction factor 


N ) 


is the square of the correction factor 


used in computing the arithmetic mean by the short method. The com- 
putation of <r by this shorter procedure is shown in Table 48. 


TABLE 47 

Computation op Standard Deviation by Long Method fob Grades of the 1937 
Graduating Class op the United States Naval Academy 

{X - 77 95) 


Grade 

Mid-values 
of classes 

X 

Number of 
midshipmen 

f 

Deviation of 
mid-values 
from X 

X 

fx 

]x^ 

68 0-69 9 

68 95 : 

4 

-9 00 

- 36 

324 

70 0-71 9 

70 95 

17 

-7.00 

-119 

833 

72 0-73 9 

72 95 

39 

-5 00 

-195 

975 

74 0-75 9 

74 95 

62 

-3 00 1 

-186 

558 

76.0-77 9 

76 95 

58 

-1 00 

- 58 

58 

78 0-79 9 

78 95 

52 

100 

52 

52 

80 0-81 9 

80 95 

35 

3 00 

105 

315 

82,0-83 9 

82.95 

22 

5 00 

110 

550 

84 0-85 9 

84 95 

18 

7 00 

126 

’ 8S2 

86 0-87 9 

86 95 

13 

9 00 

117 

1,053 

88 0-89.9 

88 95 

4 

1100 

44 

484 

90 0-91 9 

90 95 

2 

13 00 

26 

338 

92 0-93 9 

1 

92.95 

1 

15 00 

15 

225 

Total 


327 



6,647 


<r = = V 20.33 = 4.51 

Properties of the standard deviation. Of the various measures of abso- 
lute dispersion which have been mentioned the standard deviation (and 
its square, the variance) is by far the most important. It will be used in 
connection with various statistical methods described hereafter. One im- 
portant consideration is that it is one of the factors involved in the equa- 
tion for the normal curve and for various skewed curves, discussed in the 
following chapter. It is also used in testing the reliability of certain sta- 
tistical measures, in correlation, and in connection with business cycle 
analysis. 

The standard deviation is a much used measure of the spread of a seriess 
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of data, li a is measured from the arithmetic mean of a normal dis- 
tribution, 68.27 per cent of the items are included; within the range of 
X 2cr, 95.45 per cent are included; and within Z ± 3cr 99.73 per cent,^ 
or nearly all, of the items are included. Chart 106 illustrates what has 

TABLE 48 

Computation op Standard Deviation by Short Method for Grades op the 1937 
Graduating Class of the United States Naval Academy 


Grade 

Number of 
midshipmen 
/ 

d' 

fd' 

f{d'y 

68 0-69 9 

4 

-5 

- 20 

100 

70 0-71 9 

17 

-4 

- 68 

272 

72 0-73 9 

39 

-3 

-117 

351 

74 0-75 9 

62 

-2 

-124 

248 

76 0-77 9 j 

58 

-1 

- 58 

58 

78 0-79 9 

52 

0 

0 

0 

80 0-81.9 

35 

1 

35 

35 

82 0-83 9 

22 

2 

44 

88 

84 0-85.9 

18 

3 

54 

162 

86 0-87 9 

13 

4 

62 

208 

88 0-89 9 

4 

5 

20 

100 

90.0-91 9 

2 

6 

12 

72 

92.0-93.9 

1 

7 

7 

49 

Total 

327 

... 

-163 

1,743 



« 2.0 V5 0818 = 4 51. 


just been said. The percentages just given refer to a curve of the normal 
type. If the distribution is skewed, these percentages will be only ap- 
proximately realized. For the midshipmen^s grades (Table 48), X ^ cr is 
77.96 =*= 4.51, or 73.44 to 82.46. Using the procedure described for Q and 
AD, we find, interpolating into the proper classes, that within these limits 
found 222.5 of 327 grades, or 68.04 per cent of the total. Within 
Z ^ 2cr (that is, from 68.93 to 86.97), we find 311.7, or 95.32 per cent, of 
the grades. The range Z =i= 3cr runs from 64.42 to 91.48, and includes 
325.5, or 99.54 per cent, of the grades. Even though this distribution is 
skewed, it is apparent that there is substantial agreement with the pro- 
portions expected for a normal distribution. It may be observed that' the 


^ See Appendix E, which gives the areas of one-half of the normal curve (68 27 is 
twice 34.13447; 95.45 is twice 44.72499; 99,73 is twice 49.86501), 
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lower value of the X ^ 3cr range extends below the actual lower limit of 
the data. This may occur when a distribution is skewed to the right as 
is this one. 

In dealing with the normal curve in later chapters, we shall not confine 
ourselves to the proportionate areas included within =tcr, =±=2<r, and =*=3o' 
of the mean, but shall consider any desired values of a. Thus it may be 
seen from Appendix E (which gives areas of one-half of a normal curve) 
that 50 per cent of the items, or area of the curve, would be within .6745 cr 



Chart 106. Proportion of Items Included Within ± l<r, ± 2<r, and ± So* of the Arithmetic 

Mean in a Normal Curve. 

of the mean and that 90 per cent of the items would be within =^l,6i5cr 
of the mean. 

The standard deviation measures the dispersion of a series; the greater 
the spread of the series, the greater the value of cr. As a measure of uni- 
formity of the characteristic measured, the smaller the value of cr, the 
greater the uniformity. To avoid this inverse relationship, a modification 
referred to as a measure of precision is sometimes used, especially with 
reference to the precision of a series of physical measurements. This 
measure is 


It is not often used in statistical work in the social sciences. 

The value of cr which we have computed is a measure of the dispersion 
of the data (usually a sample) upon which it is based. It is not a measure, 
or an estimate, of the dispersion of the population from which the sample 
was drawn. The problem of estimating the dispersion of the population 
will be considered in Chapter XII. 
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Comparison of measures of absolute dispersion. We have considered 
the range, the quartile deviation, the mean deviation, and the standard 
deviation. It was seen that the range is useful only for the crudest sort 
of statement. The quartile deviation and mean deviation, while occa- 
sionally serviceable, are not so valuable to us as the standard deviation, 
which appears as a necessary or useful element in curve fitting, testing 
the reliability of means and other measures, time series analysis, and 
correlation. 

In the case of a normal distribution it was pointed out that 50 per cent 
of the items were present within =J= IQ of the median (or mean), 57.5 per 
cent vdthin =±=1AD of the mean, and 68.27 per cent within =^= 10 * of the 
mean. For a normal distribution these measures consequently bear fixed 
relations to each other, as follows:^ 

Q =- .6745 <T. 

AD - .7979 a, 

Q = .8453 AD. 
cr = 1.2533 AD. 

AD = 1.1830 Q. 
a - 1.4826 Q. 

Measures of Relative Dispersion 

In the preceding paragraphs we have discussed measures of absolute 
dispersion, all of which are expressed in terms of the units of the problem, 
which may be dollars, pounds, inches, percentages, etc. When we wish 
to compare the dispersions of two or more series, it may or may not be 
desirable to use such a measure. The comparison of dispersions of two 
or more series resolves itself into three possible situations. 

(1) The series to be compared may be expressed in the same units, 
and the means may be the same, or nearly the same, in size. The grades 
of the midshipmen showed a mean of 77.95 and a standard deviation of 
4.51. If another Naval Academy class showed X = 78.01 and <r ~ 3.75, 
it is clear that the second class would exhibit less dispersion. 

(2) The series to be compared may be expressed in the same units, 
but the arithmetic means may differ. Some years ago the Goodyear Tire 
and Rubber Company developed a new type of cord for automobile tires 
which was designated as ^'Supertwist.'^ The Supertwist cord was superior 
to ordinary cord in that it could stretch more and had a longer flex life. 


®The first two relationsiiips may be obtained by interpolating into Appendix E. 
The otiters are computed from them. 
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Tests made on cord as received from the cotton mill and prior to fabrica- 
tion into tires showed for the flex life of Supertwist cord 

X = 138.64 minutes, and cr ~ 16.27 minutes; 

while for regular cord the figures were 

X == 87.66 minutes, and a = 14.12 minutes. 

If we compare the two cr values, it appears that Supertwist cord is more 
variable in respect to flex life than is regular cord. However, it must be 
noted that the average flex life of Supertwist is much greater than that of 
regular cord. Taking this factor into consideration, we may set up a 
measure of relative disyersioUj 



This is the coefficient of variation and is usually expressed as a percentage. 
For the Supertwist cord 

1 6 27 

V = Jgg ' 0 ^ “ ^ 1101 , or 11.0 per cent; 
while for regular cord 

y _ 14^ _ 0.1611, or 16.1 per cent. 

87.66 

It is thus apparent that the relative variation in flex life is much less for 
Supertwist cord than for regular cord. 

Chart 107 also illustrates the comparison of dispersions of two series 
having different mean values. In section A are shown the curves of two 
distributions having the same absolute dispersions but different relative 
dispersions. In section B are curves of two distributions having quite 
different absolute dispersions but the same relative dispersions. If the 
siero is shown on the horizontal scale, as in Chart 107, a very rough visual 
impression may be had of the relative dispersion of a series. For this 
reason some statisticians think it is desirable to show the zero on the 
horizontal scale. This does not seem to be a very important matter, how- 
ever, since relative dispersion can at best be visualized only very roughly 
by this device. Occasionally frequency distributions are formed wdth class 
intervals expressed, not in terms of original units, but as percentages of 
the mean, the interval being some convenient figure, such as 10 per cent 
of the mean. If two such distributions are plotted on one chart, it i& 
easy to compare visually their relative dispersions. 

(3) The series to be compared may be expressed in different units. In 
such a case the standard deviations cannot be directly compared, A 
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study of a large number of male industrial workers® revealed an average 
pulse rate of 81.1 beats per minute and a standard deviation of about 12.2 
beats per minute. Measurements of height showed X = 66.9 inches and 
cr = 2.7 inches. The measurements of height included a small number of 
men not measured as to pulse rate. Let us disregard this difficulty for 
the purposes of our illustration. Are the industrial workers more variable 
in respect to pulse rate or height? It is obvious that the two standard 



0 “ 50 100 150 

X VALUES 
A 


FREQUENC lES 



B 

Cliart 107. Comparisons of Dispersions of Series Having Different Arithmetic 
Means. A. Same absolute dispersion, different relative dispersion: left-hand curve, 
Z = 33, cr - 10, F = 30,3 per cent; ri^t-hand curve, X = 101, cr = 10, 7 == 9 9 per 
^nt. B. Different absolute dispersion, same relative dispersion: left-hand curve, 
X = 50, <r - 5, F = 10 per cent; right-hand curve, X - 100, <r = 10, F = 10 per cent. 
(Sections A and B have different vertical scales since they are not intended to be com- 
pared. However, if the vertical scale of section B is expanded 50 per cent, all curves 
will have the same area.) 

deviations, being in different umos, cannot be compared. Computing the 
two coefficients of variation shows for pulse rate 
12 2 

y = ~ .149, or 14.9 per cent, 

* Based on data in A Health Study of Ten Thousand Mole Industrial Worhers^ pp, 45 
and 59, United States Pubhc Health Service, Public Health Bulletin, 162. 
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and for height 


2 7 

^ — *040, or 4.0 per cent. 

DO.y 


It is clear that, for this group of men, pulse rate is subject to greater dis- 
persion than is height. 


Somewhat akin to our measurement of relative dispersion is the possi- 
bility of expressing a given value in terms of its divergence from, the mean 
and also in terms of the dispersion of the series. Such a procedure is not 
especially useful when we are considering only one value or comparing 
two values from the same series. Its usefulness becomes apparent when 
we want to compare two values from different series and when those two 
series (1) differ in respect to X or cr, or both, or (2) are expressed in differ- 
ent units. Suppose that a certain student has m^e a grade of 180 on 
an intelligence test, and that his group showed X == 160 and cr = 15. 
This same student made a grade of 86 in history, and the group showed 
Z = 70 and a - 12. We are interested in knowing whether his relative 
standing is higher in the intelligence test or in history. In the intelligence 
test he was 20 points above the mean, and in history he was 16 points 
above the mean. These deviations, however, are not comparable, but 
may be rendered so by dividing by their respective standard deviations. 


Thus 


Intelligence test: 




History: 


cr 


180 - 160 ^ 

15 15 

86 - 70 ^ +16 ^ 
12 12 


- +1.33; 
+1.33, 


It is apparent that the student shows the same relative standing in each 
group, being +1.33 <r above the mean in each. The usefulness of this 
device is by no means limited to the educational field. It is, however, 
often used with test data and is then referred to as a ^‘standard score.” 


Skewness 

When a series is not symmetrical, it is said to be asymmetrical or skewed. 
In Chart 103 we showed such a skewed curve in relation to a symmetrical 
one. The curve of midshipmen^s grades (Chart 108) is also skewed. Our 
measures of skewness mdicate not only the amount of skewness but also 
the direction. A series is said to be skewed in the direction of the ex- 
treme values, or, speaking in terms of the curve, in the direction of the 
excess tail. Thus the two curves referred to above are both skewed posi- 
tively, or to the right. Most skewed curves encountered in the social 
sciences are skewed to the right. Only rarely do we find curves skewed 
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NUMBER. OF 
MIDSHIPMEN 

70 
60 
50 
40 
30 
20 
10 
0 

68 70 72 74 76 76 80 82 84 86 88 90 92 94 

GRADE 

Chart 108. Grades of the 1937 Graduating Class of the United States Naval Academy, 
Showing Location of Mean, Median, and Mode. 

to the left, such as those shown in Charts 85 and 109, and even less rarely 
do we find data characteristically skewed to the left. 

Many series, however, are characteristically skewed to the right. Ex- 
amples are frequency distributions of wages or salaries, use of electricity 
(see Chart 125, p. 293), weights of adult male human beings, and numer- 
ous other variables. Distributions of grades are apt to be moderately 

NUMBER OF 
INVENTORS 

70 
60 
50 
40 
30 
20 
10 
0 

35 45 55 65 75 85 95 105 

AGE IN YEARS 

ChRXt 109. Age at Death of 371 American Inventors. (Data from “Bio-Sociai 
Characteristics of American Inventors,^' by Sanford Winston, AmeT%can Sociological 
Review, Vol. 2, No. 6, December 1937, pp 837-849.) 





Chap. 10] DISPERSION, SKEWNESS, AND KURTOSIS 


251 


skewed to the right, or nearly symmetrical In the case of the midship- 
men’s grades the skewness is partly due to the fact that we are considering 
only those men who had survived the previous three years, during which 
some of the less able had been dropped. The distribution of ages at death 
of the American inventors in Chart 109 may be characteristically skew^ed 
to the left, or the skewness may be due to the fact that a time factor is 
present — almost one-fifth of the inventors included in this study were born 
before 1800. 

Pearsonian measure of skewness. It was pointed out in the preceding 
chapter that the mode is not influenced by the presence of extreme values, 
the median is influenced by their position only, and the arithmetic mean 
is influenced by the size of the extremes. Consequently we could make 
use of the mode and the mean to measure skewness. We might say then 
that skewness == mean ~ mode. But there are some shortcomings of such 
a measure. In the first place it is a measure of absolute skewness and 
would have much different meaning for a series of small dispersion than 
for a widely dispersed series. It is rather unusual for two (or more) 
series to have the same cr and therefore we practically never use a measure 
of absolute skewness, preferring a measure of relative skewness. The 
measure just mentioned may be put mto relative terms by dividing by <7, 


Now 


Skewness = 



This gives us a relative measure wdth positive sign w^hen skewness is to 
the right, and with negative sign when skewness is to the left. There is, 
however, another important difficulty growing out of the fact that the 
mode is not very satisfactorily located for most frequency distributions. 
The median is rather satisfactorily located and therefore w^e use the 

measure'^ _ 

gj^ ^ 3(X ~ Med) 
cr 

In the preceding chapter we found that, for the midshipmen’s grades, 
the value of X was 77.95, while the median was 77.38. In this chapter 
the value of cr was found to be 4.51. The skewness, then, is 


_ 3 ( 77.95 - 77 , 38 ) . ,, 

4.51 

7 The presence of the 3 in the expression B explained as follows: Karl Pearson showed 
empirically that in moderately skewed distributions of a contmuous variable the median 
fell about I of the distance from the mode toward the mean. Consequently we have 
Mo == X - 3 (Z - Med) Now if we substitute this expression for the mode in the 
measure of skewness, we have 


Sk = 


X - [Z - 


3(Z - Med*)] 
<r 


3(1 - Med) 
<r 
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TABLE 49 

CoMPtTTATION OP VaEIOUS MbASHEES FOE AgB AT DbATH OF 371 AMERICAN INVENTORS 


Age at death in years 

/ 

d' 

fd' 

f(dV 

f(dV 

35 and under 

40 

3 

-6 

-18 

108 

-648 

40 and under 

45 

6 

-5 

-30 

150 

-750 

45 and under 

50 1 

12 

-4 

-48 

192 

-768 

50 and under 

55 

16 

-3 

-48 

144 

-432 

55 and under 

60 

26 

-2 

-52 

104 

-208 

60 and under 

65 

40 

-1 

-40 

40 

- 40 

65 and under 

70 

50 

0 

0 

0 

0 

70 and under 

75 

56 

1 

56 

56 

56 

75 and under 

80 

62 

2 

124 

248 

496 

80 and under 

85 

55 

3 

165 

495 

1,485 

85 and under 

90 

25 

4 

100 

400 

1,600 

90 and under 

95 

17 

5 

85 

425 

2,125 

95 and under 

100 

2 

6 

12 

72 

432 

100 and over* 


1 

7 

7 

. 49 

‘ 343 

Total... . 

•• 

371 

... 

+313 

2,483 

+3,691 


* This class assumed to have its mid-value at 102 5 

Source Sanford Winston, “Bio-social Characteristics of American Inventors," American Sociological 
Review, Vbl 2, No 6, December 1937, p 848 and by correspondence 


N 


= 185.5. 


N 


92.75. 


^^371 

10 


29 75 

<?i - 60 + X 5 - 63.72 years. 

Qs — 85 — X 5 — 80.66 years. 

32.5 

Median = 70 + X 5 ~ 72.90 years. 
00 

Pio - 55 + ^ X 5 = 55.02 years. 
17.1 

Pgo =» 90 — X b ~ 86.58 years. 


gio 

X = 67.5 + X 5 = 71.72 years 


Zf(dn^ 2,483 

K -In 


years. 


TTt = 0 . 

P2 6.692722 - ( 843666)2 = 5.980950. 

ITS - - 3piP2 + 2pf « +9.948787 3 (.843666) (6.692722) + 2(.843666)3 

= -5.789483. 
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This may be considered as a moderate degree of skewness since the measure 
varies within the limito^ ef ±3. It should be added that values as large 
as 1 are rather unusual. 

For the data of age at death of the American inventors, it is shown 
under Table 49 that 1 = 71.72 years, while Med - 72.90 years and 
(X = 12.23 years. The Pearsonian measure of skewness is 

3(71.72 - 72.90) 

= — 12:23 " 

Measures of skewness based on quartiles and percentiles. It was pre- 
viously pointed out that in a symmetrical distribution Qi and Qs lie equi- 
distant from the median. If a series is skewed to the right, Qs is farther 
from the median than is Qi. If a series is skewed to the left, Qi is farther 
from the median {Q 2 ) than is Qz> An absolute measure of skewness, then, 
may be given by 

(Qz - Q 2 ) - (Q 2 - Qi) = Qi + Q3 - 2Q2. 

This measure is put into relative terms by dividing by the quartile devia- 
tion. Thus as a measure of skewness we could use 


Qi + Q3 2Q2 
Qz — Qi 
2 


It can be shown® that this value varies within the limits of =^2. It is 
perhaps a little more satisfactory to use as a measure of skewness 


Qi + Qs 2Q2 

= Q3 ^ 

which, of course, varies within the limits of 1. 
^'quartile measure of skewness.^' 


We may call this the 


For the grades of the midshipmen, the values of Qi, Q 3 , and the median 
were previously obtained. Thus 


Sko- 


74.65 + 80.71 - 2(77.38) 
80.71 - 74.65 


+. 10 . 


For the age at death of the American inventors, the values of the quar- 
tiles and Che medians are shown below Table 49. The skewness, then, is 

oT 80.66 + 63.72 - 2(72.90) _ _ 

~ 80.66 - 63.72 

8 Harold Hotelling and Leonard M. Solomons (“The Limits of a Measure of Skevr 
ness/^ Annals of Mafhemaitcal Statistics, May 1932, pp. 141-142) have shown that 
X ~ Med 

— lies between ± 1. 

9 See Appendix B, section X-2. The maxiiaum TaJue in actual use ■«rill, of course 
be less than ?*- 
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It is apparent that the upper 25 per cent and the lower 25 per cent of 
the data have only a positional influence in determining the value of this 
measure. All the items in the upper quarter might be closely clustered 
or they might be spread out over a wide range, yet their influence on Skg 
would be the same. 

A somewhat more sensitive measure of skewness may be based upon 
the deciles or percentiles. We could thus consider the 10th and 90th 
percentiles in relation to the 50th percentile (the median). The measure 
of absolute skewness would be (P90— P50) — (P50 — Pio), or P90 + Pio 
— 2P50. If this measure is divided by the 10-90 percentile range (that 
is, P90 Pio), we have a measure of relative skewness 

_ PlO + P90 ~ 2 P 6 Q 

DKp P p ^ 

rgo — jT 10 

which varies within the limits^® of =^=1. 

Computing Skp for the midshipmen^s grades, we refer to the earlier part 
of this chapter for Pio and P90, which have previously been computed. 
These were Pio = 72.55 and P90 — 84.54. For the measure of skewness, 
then 


Skp 


72.55 + 84.54 - 2(77.38) 
84.54 - 72.55 


+.19. 


The values of Pio and P 90 for the age at death of the inventors are 
shown below Table 49. Computing the skewness gives 


Skp 


86.58 + 55.02 - 2(72.90) 
86.58 - 55.02 


-.13. 


Measure of skewness based on the third moment. We have seen 
that the most satisfactory measure of dispersion is the standard deviation, 
which is based upon the second moment about the mean 

TTs = and <r = = JM. 

N y N 

A very useful measure of skewness may be obtained by making use of the 
third moment about the mean 

It will be recalled that the first moment about the mean 


The proof of this exactly p^aiiels that given in Appendix B, section X-2. The 
maximum value in actual use of course, bo appreciably less than 1. 
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is always zero. However, the third moment about the mean is not zero 
unless the distribution is symmetrical about the mean. Cubing a devia- 
tion does not change its sign. It does, however, have a disproportionately 
large effect on large deviations. As illustrations, consider the two sets of 
data given in Tables 50 and 51, the first of w’hich is symmetrical around a 
mean of 6, while the second is not symmetrical around a mean of 6. Both 
sets of data have 

Sa; 

TTi = — = 0, 

and the data of Table 50 have 


But the figures in Table 51 show’ 


= + 6 . 


TABLE 50 

Computation op First and 
Third Moments of a 
Symmetrical Series 


TABLE 51 

Computation of First and 
Third Moments of an 
Asymmetrical Series 


X 

X 


x^ 

X 

X 

2-3 

2 

-4 


-64 

3 

-3 

-27 

4 

-2 


- 8 

4 

-2 

- 8 

6 

0 


0 

6 

0 

0 

8 

+2 


+ 8 

7 

+1 

+ 1 

10 

+4 


+64 

10 

+4 

+64 


0 


0 


0 

+30 


Xx 

0 

= 0. 


Sa; 0 



TTz - ^ - 

‘ 5 ■" 

TTi 

SB ” S* “ W 0. 

N 5 




0 

« 0. 


Xx^ +30 

+6. 


N '' 

" 6 ' 

TTz 

^ ” 5 “ 


To compute the third moment of a frequency distribution, 


taking the actual deviations from the arithmetic mean, cubing them, mul- 
tiplying by the frequencies, summing, and dividing by N, w^ould be labo- 
rious. As shown in Appendix B (section X-1), the second moment or 
7 r 2 can be obtained by a short process. In terms of class intervals, 
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The value of the third moment (also in terms of class intervals) is given 
byii 

_ ,2/d' , o/MV 


~N ^ W N N 


Or, letting vt = and vz 


2/(d03 
N ’ 


T2= V2 - V?, 


Tz — Vz — ZV\V2 + Svf. 

Obviously, tts is a measure of absolute skewness. It is put into relative 
terms by dividing by The measure of relative skewness, based on the 
third moment,^^ is 

X 3 'Kz 

where <t is in class intervals. 

The symbol \/i^ is also used to identify this measure. 

The values of the first, second, and third moments for the data of mid 
shipments grades are computed in Table 62. From these we obtain* 

,3=,J^ = _±ISL = +.57. 

^(5.081802)3 

Similarly, the first three moments for the age at death of the American 
inventors have been computed in Table 49, showing 

-5.789483 

az == 7=— — —.40. 

V(5.980950)3 

If the distribution is S 3 mimetrical, as, of course, is zero. Sometimes 
a| or i3i is used as a measure of skewness and 

No definite upper limit is apparent for olz or jSi, but values as great as ±2 
for OLz indicate marked skewness. 


See Appendix B, section X-3. 

No previoiis mention has been made of ai or 0 : 2 . For any series of figures, 

ai — or ^ 0; 

^ V7r2 


TTa 7 r 2 

' 0-2 v/3 
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TABLE 52 

Computation op Pihst Three Moments for Grades op the 1637 Graduating 
Class op the Entted States Naval Academy 


Grade 

Number of 
midshipmen 
/ 

d' 

fd' 


KdV 

68 0-69 9 

4 

—5 

- 20 

100 

- 500 

70.0-71.9 

17 

-4 

- 68 

272 

-1,088 

72.0-73.9 

39 

-3 

-117 

351 

-1,053 

74.0-75.9 

62 

-2 

-124 

248 

- 496 

76.0-77.9 

58 

-1 

- 58 

58 

- 58 

78 0-79.9 

52 

0 

0 

0 

0 

80 0-81.9 

35 

1 

35 

35 

35 

82.0-83.9 

22 

2 

44 

88 

176 

84.0-85.9 

18 

3 

54 I 

162 

486 

86.0-87 9 

13 

4 

52 

208 

832 

88.0-89.9 

4 

5 

20 

100 

500 

90.0-91.9 

2 

6 

12 

72 

432 

92.0-93.9 

1 

7 

7 

49 

343 

Total 

327 


-163 

1,743 

-391 


S/d' -163 


Vl 

“■ ' 

327 ’ 


S/(d')* 

1,743 ^ 

V2 

^ N 

'327 “ 


S/(d')* 

-391 

Vz 

N 

327 

TTl 

- 0. 


T2 

ti 

1 

5.330275 


= -.498471 


Ts = P3 — 3 PiI' 2 + 2pf 
= +6.627531. 


-1.195719. 


(- 498471)* = 5.081802. 

■ 1.195719 - 3(-.498471) (5.330275) + 2(-.498471)» 


In the preceding chapter it was pointed out in footnote 5 that the value 
of the mode may be estimated by use of the expression 


Mo = X- 


‘^ 2 ( 5^2 - 6|3i - 9)' 


The final fraction of this expression is sometimes used as a measure of 
skewness. It is often called x (cbi) bn't. since we use in a different 
sense in the following chapter, we shall refer to it as Sk^ and 

CSV _ + 3) 

^ ^ ~ 2(5)32 - 6j8i - 9)‘ 


The computation of ^2 is de.«cribed in the following section. The sign for 
this measure of skewness is obtained by giving Sk^ the same sign as wj. 
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Kurtosis 

Chart 110 shows a distribution which is more peaked than normal. 
Such a distribution is referred to as leptokurkc. A platyhurtic, or flat- 
topped, distribution is shown in Chart 111. The normal curve is desig- 

NUMBER 
OF HOUSES 



Chart 110. Cost of New 5-Room Hoti^ and Lot to Purchaser, Cleveland, 1924, 
and Normal Curve Having Same iV, X, and cr. (Based on data of Table BZ.) 


nated as mesokurlic.^^ The degree of kurtosis present in a series may be 
measured by making use of the fourth moment, 


T4 = 



or, for a frequency distribution, 


T4 


N 


Knriic ^ humpbacked; thus humped or unimodal. Lepto slender, narrow. 
Plaiy broad, wide, flat. Mm « in the middle, intermediate 
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By a procedure similar to that given in Appendix B, section X-3, it 
may be shown that 


_-2fidr 


A Mdy , JM.Y md'Y _ o/MV 
N N ^\n ) N \ N }' 


or letting 


Vi 


md'Y 

N 


TTi = Vi — iViVs + 6i^V 2 — Zv\. 

Now TTi gives an absolute expression for kurtosis. This may be put 
into relative terms by dividing by cr^ = 7r|. The measure is known as 
'Xi or ^ 2 , and 


o ^4 

0^4 = P 2 = or 


^4 ^ ^4 


where cr is m class intervals. 

PERCE NTAGE 
TREQUENCIES 



Chart in. Length of Life of a Group of Electric Lamps and Normal Curve Having 
Same N, X, and o'. (Based on data of Table 64. The tails of the normal curve are 
not shown. The left tail would cross the Y axis.) 


This expression has a value of 3.0 for the normal curve. For a flat-topped 
curve, Oi < 3.0. For a peaked curve, ca > 3.0. 

The peaked curve of Chart 110 is .shown in comparison with a normal 
curve having the same N, T, and a. In Table 53 the moments of this 
leptokurtic distribution have been computed and the value of Oi or 
02 = 4.46. 
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The flat-topped; or platykurtic, curve in Chart 111 is also shown in 
relation to a normal curve having the same W, X, and cr. The moments 
of the flat-topped series are shown in Table 54 and from these 04 or ^2 is 
found to be 2.22. 


TABLE 53 

Computation of First Four Moments for Cost of New 5-IIoom Wood House 
AND Lot to Purchaser, Cleveland, 1924 


Cost 

(mid-values) 

f 

d' 

fd' 

f(dr 


f{d')* 

S 1,500 

2 

-5 

-10 

50 

-250 

1,250 

2,500 

1 

-4 

- 4 

16 

- 64 

256 

3,500 

2 

-3 

- 6 

18 

- 54 

162 

4,500 

6 

-2 

-12 

24 

- 48 

96 

5,500 

16 

-1 

-16 

16 

- 16 

16 

6,500 

27 

0 

0 

0 

0 

0 

7,500 

16 

1 

16 

16 

16 

16 

8,500 

7 1 

2 

14 ! 

28 

56 

112 

9,500 

3 1 

3 

9 

27 

81 

243 

10,500 

1 

4 

4 

16 

64 

256 

11,500 

1 

5 

5 

25 

125 

625 

Total 

82 

1 

0 

236 

- 90 

3,032 


Source Frank R Garfield and William M Hood, “Construction Costs and Real Property Values,” 
Journal of the American Statistical Association, Vol 32, No 200, December 1937, p 647. Data are those 
shown m Chart I for 5-room wood houses 


Vi 


V2 


Vz 


V4 


WL 

N 

md') 


£ 

82 


■ 0 . 


N 


236 

= — = 2 878049. 


= -1 097561. 


= 36 975601. 


N 

” 82 “ 


-90 

N 

82 

mdr 

3,032 


82 


xi « 0. 

7r2 = Pa - pf = 2 878049. 

TTs = P3 — 3 piP3 + 2pi ~ —1 097561 

'7r4 ~ P4 — 4 piP3 + 61 ^ 4^2 — Zvf = 36 975601. 

TTi 36 975601 

Not!B: The assiuned mean ($6,500) and the mean coincide, resulting in a value of 0 for vu There are 
therefore no differences between the v and tt values, since z'f = 0, viV 2 = 0, « 0, vivz — 0, etc. 

When a deviation is raised to a fourth or a second power, its sign be- 
comes positive. The fourth power increases extreme deviations dispro- 
portionately in comparison with raising them to the second power. Con- 
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TABLE 54 

Computation of First Four Moments for Length of Life of a Group of 

Electric Lamps 


Length of life 
in hours 
(mid-values) 

Percentage 

frequencies 

/ 

d' 

fd' 

fid'y 

fidr 

fcdr 

50 

10 

~9 

- 9.0 

81.0 

- 729 0 

6,561 0 

150 

1.5 

-8 

-12.0 

96.0 

- 768.0 

6,144.0 

250 

3.1 

~7 

-21 7 

151.9 

-1,063.3 

7,443.1 

350 

4.4 

-6 

-26 4 

158.4 

- 950 4 

5,702.4 

450 

50 

-5 

-25 0 

125.0 

- 625.0 

3,125 0 

650 

57 

-4 

-22.8 

91.2 

- 364.8 

1,459 2 

650 

6.6 

-3 

-19 8 

59 4 

- 178.2 

534 G 

750 

73 

-2 

-14.6 

29 2 

- 58 4 

116 8 

850 

7.6 

~1 

- 7.6 

76 

76 

7 6 

950 

7.8 

0 

0 

0 

0 

0 

1050 

7.8 

1 

78 

78 

78 

78 

1150 

7.6 

2 

15 2 

30.4 

60.8 

! 1216 

1250 

7.3 

3 

21.9 

65 7 

197 1 

591 3 

1350 

6.6 

4 

26 4 

105.6 

422 4 

1,689.6 

1450 

57 

5 

28 5 

142 5 

712 5 

3,662 5 

1550 

5.0 

6 

30 0 

180 0 j 

1,080 0 

6,480 0 

1650 

4.4 

7 i 

30 8 

215 6 

1,509.2 

10,564 4 

1750 

3.1 

8 

24.8 

198.4 1 

1,587.2 

12,697.6 

1850 

1.5 

9 

13 5 

1215 

1,093.5 

9,841.5 

1950 

1.0 

10 

10.0 

100 0 

1,000.0 

10,000.0 

Total 

100.0 


+50 0 

1,967 2 

+2,925.8 

86,650.0 


Source. Robley Winfrey and Edwin B Kurtz, JLi/e Characterishcs of Physical Property, Bulletin 103, 
Iowa Engineering Experiment Station, p. 58, Property Group 28-2. 


Vi = 


P2 


Vz - 


Vi = 


E/d' 

2/(d0" 

N 

mdv 

N 

N 


+50 


- +.50. 


100.0 
_ 1,967 2 
100.0 

_ +2,925.8 
100 0 
_ 86,650 0 
“■ 100.0 


=» 19.672. 


+29.258. 


866.500. 


TTi = 0. 

7r2^vz-vl^ 19.672 - (.50)2 « 19.422. 

Tz = vz- ^vivz + 2vl - 29.258 - 3(.50) (19.672) + 2(.50)» « 0, 

TTi = j/4 - 4 fiF3 + Svlvz - Zvt « 866.500 -- 4{.50) (29.258) + 6(.50)2(19.672) - 3(.50)* 


« 837.3045. 

„ 837 3045 
“ ir| “ (19.422)2 


2 . 22 . 
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sequently the narrower the shoulders of a distribution and the longer the 
bails, the greater will be 7 r 4 in relation to 7 r| 


Correction of the Moments for Grouping Error 

Recapitulating what has been said earlier concerning the first four mo 
ments of a frequency distribution: 

Moments around an arbitrary origin: 


Vl 


V2 = 


S/d' 

N ‘ 

N 


vz 


md'f 

N 


Vi = 


mdr 

N 


Moments around the arithmetic mean: 

TTl = 0. 

7r2 = I'2 — 

TTz = Vz — ZViVz H- 2vf. 

TTi = Vi ~ iViVz + — 3 v{. 

In computing the mean, the standard deviation, ts, and T 4 for frequency 
distributions, we made use of the mid-values of the classes as representative 
values. We saw, in the previous chapter, that the mid-values were incor- 
rect assumptions but that the errors present tend to offset each other 
when we compute the arithmetic mean. This offsetting is also present 
when the third moment is computed. It will be remembered that the 
mid-values of the classes preceding the modal class tend to be too small, 
while the mid-values of the classes following the modal class tend to be 
too large (see Table 38). The result is that the various z values are 
slightly larger (in absolute value) than they should be and no offsetting 
occurs when they are squared or raised to the fourth power. Consequently 
the value of ttz (and (t) and the value of T 4 are apt to be shghtly larger 
than the values computed from the same data ungrouped. Sheppard’s 
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corrections attempt to offset this upward bias. The corrected moments 
are indicated by ix and are:^^ 

Ml == TTi = 0. 

M2 ~ 7r3 

M3 = TTs. 

M4 = ■2-'7r2 "h 

where all computations are in terms of class intervals. 

If we use the class means instead of the mid- values, the arithmetic mean 
can be computed accurately. However, if class means are used, the values 
of '7r2 (cr2) and tt^ will be smaller than if computed from the same data un 
grouped. We shall give an arithmetic illustration to show that, when the 
mean of each of several groups of figures is substituted for those figures, 
O’ for the series is decreased; that is, it has a downward bias. 

Consider the two following sets of data. The first contains nine differ- 
ent values; the second shows the mean of the first three items repeated 
three times, the mean of the second three items repeated three times, and 
the mean of the last three items repeated three times. The standard de- 
viation of the nine different items is 2.58, but the standard deviation of 
the three groups of means is 2.45. 


X 


X 

X^ 

1 

1 

2 

4 

2 

4 

2 

4 

3 

9 

2 


4 

16 

5 

25 

5 

25 

5 

25 

6 

36 

6 

25 

7 

49 

8 

64 

8 

64 

8 

64 

9 

81 

8 

64 

45 

285 

45 

279 


2,58. 


ifi- 

> 9 V 

9 ) 

1 9 

V9/ 


If a distribution is so fiat that the mid-values of each class closely ap- 
proximate the class means, the value of o (and 7r2 and ^ 4 ) based on those 
mid-values may have a downward bias. Such a situation is unusual. 

Sheppard's corrections may be applied when we are dealing with a con- 
tinuous variable and high contact is present at both ends of the series. 
By ^^high contact" we mean that both tails of the distribution approach 
the X-axis asymptotically. If these conditions do not obtain, Sheppard's 

^4 For a developmentj see H. L. Rietz (editor), ffundbook of MtxtheivMicul St^iiistics 
pn. 92--94, Houghton Mifflin Compapy, Boston. 1924, 
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corrections should not be used, as the corrections may over-correct.^^ 
Neither is there justification for applying Sheppard^s corrections if the 
original observations have not been made with reasonable accuracy. 

In Table 48 the value of or was found to be 4.51. If <r is computed for 
the ungrouped data of Table 25 (p. 165) the value obtained is 4.45. In 
Table 48 the value of X 2 (in terms of groups) is seen to be 5.0818. Apply- 
ing the correction, we have 

M2 = 5.0818 - .0833 =- 4.9975. 

Computing the corrected value of cr gives 

a = 2.0VT9975 = 4.47, 


which agrees somewhat more closely with the value of <r = 4.45, as ob- 
tained from the ungrouped data. 

The and /S^s may be computed from the m’s exactly the same way 
as from the tt^s. Thus 


OLl 


_ Ml _ Ml 

O' 


IJ.2 

0-2=^ 


M2 

V/i| 


= 1 . 




vs 
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CHAPTER XI 


DESCRIBING A FREQUENCY DISTRIBUTIOIN 
BY A FITTED CURVE 


A frequency distribution usually represents a sample drawn from a much 
larger population or universe. Even though a sample is composed of but 
a few hundred or a few score items, it may be reasonably representative 
of the larger universe from which it was drawn. Since it is virtually never 
possible to measure all of the individuals or items comprising a universe, 
we must form our notion of the larger group from a study of a sample. 
We may therefore fit any one of a number of types of curves to a fre- 
quency distribution in order to attempt to describe what appears to be 
the general form of the curve for the entire population. 

The purpose in fitting a curve to a frequency distribution may be any one 
of the following: 

(1) To ascertain whether or not a given curve describes the general 
shape of the distribution. For example, we may wish to demonstrate 
that the chance errors involved in making some measurement may be de- 
scribed by a normal curve (see Chart 113). 

(2) To enable us to generalize concerning the proportions of items 
which should be expected to fall above, below, or between certain values. 
For example, we may take the case of fitting a curve to a frequency dis- 
tribution of the length of life of incandescent lamp bulbs; from such a 
procedure we are enabled to infer what proportion might, in general, be ex- 
pected to bum 1,500 hours or more (or more or less than any specified 
number of hours). Similarly, in the case of the data shown in Charts 
116, 117, and 119, we may determine the number of individuals which in 
general would be expected to occur above, below, or between any two 
X values. In like fashion the life insurance actuary may fit a curve to, or 
graduate data having to do with, deaths classified by age and thus de- 
termine the expected number of individuals dying during each year of life 
or surviving given ages. 

(3) To enable us to determine, from a curve fitted to a given distribu- 
tion, the probable distribution of values in a closely associated series. For 

265 
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example, consider the illustration (developed elsewhere by the writers^) of 
a normal curve fitted to a distribution of the circumference of boys’ heads; 
the curve enables us to draw a reasonable conclusion as to the number of 
caps of specified sizes which should be manufactured for the group from 
which the sample was drawn. Likewise, from a consideration of the 

NUMBER 

OF ITEMS . 


X VALUES 

Chart 112. The Nonnal Curve of Error Yc 

measurements of the circumference of men’s necks (Chart 120), we can 
ascertain the probable number of collars of each size which would be 
needed. 

This chapter will not attempt a comprehensive treatment of the topic 
of fitting frequency curves. For such a discussion, the reader is referred 
to the publications listed at the close of the chapter. We shall consider 
first the symmetrical curve known as the normal curve of error and then, 
briefly, binomials and certain of the simpler skewed curves. 

The Normal Curve of Error 

Development of the normal curve. The concept of the normal curve 
(pictured in Chart 112) appears to have been originally developed by 
Abraham De Moivre and explained in 1733 in a mathematical treatise^ 

^ See F. E. Croxton and D J. Cowden, Practical Business Statistics, pp. 257-260, 
Prentice-Hall, Inc., New York, 1934 

2 Approximatio od Sunmam fenninorum B%nomii {a 4* &)” in Sefiem expansi, Nov. 
12, 1733, being a second supplement to Miscellanea Analytica 1730. See iCarl Pearson, 
Historical Note on the Origin of the Normal Curve of Errors, Biotnetrika, Vol. 16 (1924), 
pp. 402-404; also, Helen M Walker, Studies in the History of Statistical Method, pp 
13-17, 22-23. Williams and Wilkins, Baltimore, 1929 
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which its author believed had no practical applications other than as a 
solution of problems encountered in games of chance. Gauss later used 
the curve to describe the theory of accidental errors of measurements in- 
volved in the calculation of orbits of heavenly bodies. Because of Gauss^ 
work this curve is sometimes referred to as the Gaussian curve. 

Chart 113 shows a column diagram of 144 measurements of a line^ and 

NUMBER OF 
MEASUREMENTS 



Chart 113. Normal Carve Fitted to 144 Measurements of the length of a Line. 
(Measurements from L. D. Weld, Theory of Errors and Least Squares ^ p. 147, The 
Macmillan Company, New York, 1916.) 

a normal curve of error fitted to these measurements. In fitting a normal 
curve it is assumed that only chance errors are present and that the 
arithmetic mean of the 144 measurements (2,179.1 feet) represents the 
best approximation of the true length of the line. It will be observed: 
(1) that small errors are more frequent than large ones, (2) that very large 
errors are unlikely to occur, and (3) that positive and negative errors of 
the same numerical magnitude are equally likely to occur — ^in other words, 
the curve is symmetrical. Because the fitted curve represents the rela- 
tionship between the magnitude of an error and the probability of its oc- 
currence in a given series of measurements, it is frequently termed the 


® The 144 measurements are from L. D. Weld, Theory of Errors and Least Squares, 
p, 147. Macmillan. New York, 1916. 
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normal probability curve or simply the normal curve.^ It will be seen in 
Chart 113 that the observed data have been shown as a rectangular fre- 
quency polygon, while the fitted data are represented by a curve drawn 




F 


Chart 114. Apparatus to Illustrate the Expansion of the Binomial 


with a dotted line. Alternately, the observed data may be shown by a 
cmwe drawn with a solid line. 

Chart 114 shows a simple apparatus which illustrates the play of chance 
in producing a symmetrical distribution. The device consists of a number 
of troughs, open at one end and placed as shown in section A of Chart 114. 


^ See H. M. Gkjodwin, Precision of Measurement and Graphical Measures^ p. 14, G. H. 
Ellis Co. (printers), Boston, 1909; also, A. DeF. Palmer, Theory of Measurements, p. 33, 
McGraw-Hill, New York, 1912. 

The reader may be interested in consisting one authority who denies the applicability 
of the Gaussian curve as a description of errors of measurement. See N. R. Campbell, 
An Account of the Principles of Measurement and Calculation^ Ch. IX, especially p 182, 
note 1, Longmans, Green & Co London, 1928. 
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Trough d is filled with sand or some similar granular substance, if the 
tipped so that the left-hand side rises (section B of Chart 
114), the sand in trough d will flow J into trough j and I into trough A. 
This represents the binomial (i + i). 

If the right-hand side of the machine 
is then raised (section C of Chart 114), 
the sand from j will flow | into c and i 
into dy while the sand from k will flow 
I into d and | into e. Of the total 
amount of sand, we now have J in c, | 
in dj and J in 6 , representing the expan- 
sion of the binomial (^ + 1)^. Again 
tipping the device, as in section D of 
Chart 114, J of the sand from c flows 
into i, and J intoi; J of the sand from 
d flows into j, and J into k; and | of 
the sand from e flows into ky and | into 
I The result is that | of all the sand 
is in ij I is in y, | is in /b, and | is in 
ly representing the expansion of the 
binomial (| + J)^. Tipping the apparatus as in section E of Chart 114 
causes the sand to flow into &, into c, - 3 ^ into d, into e, and 

into /, representing the ex- 
pansion of (I + 1)^. Once 
more tipping the machine (sec- 
tion F of Chart 114) results 
in putting of the sand into 
hy ^ into iy If into jy |f into 
kj into ly and ^ into m, 
which is the expansion of 

(l + i)". 

While the above illustration 
is instructive and gives us a 
picture of the expanded bi- 
nomial, the device would be- 
come clumsy if we attempted 
to carry the expansion of the 
binomial much farther.'^ We 
may obtain similar results by 
tossing coins— -a procedure 
which eliminates the necessity 

® A slightly different device is shown in F, E Croxton and D. J. Cowden, Practical 
BvMmss Statisticsy p. 242, Prentice-Hall, Inc., New York, 1934- 
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Chart iisB. Expected Results of 10,000 Tosses of 
Six Coins. 
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+. 1 * 3 , + ±,,4 

Chart 115A. Expected Results of 10,000 
Tosses of Four Coins. 
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of constructing any apparatus. It is assumed that we are tossing perfect 
coins which are evenly balanced and which will not stand on edge. With 
such a coin the chances of throwing a tail or a head are identical and may 
be expressed by where unity (1.0) represents certainty. 

If two coins are tossed simultaneously, we may obtain either no 
heads (two tails), a tail 
and a head, or two heads* 

In order for two tails to 
appear, both coins must 
fall tails up. To obtain a 
tail and a head, one coin 
may show a tail and the 
other a head, or the first 
coin may show a head, the 
other a tail. Two heads 
may appear only if both 
coins show heads. Since a 
tail and a head may occur 
in two ways, while two tails 
may occur in but one way, 
it follows that there is twice 
as great a probability of 
throwing a tail and a head 
as of throwing two tails. 

Similarly, there is twice as 
great a chance of throwing 
a tail and a head as there is 
of throwing two heads. We may express the probabihties arising from 
tossing two coins by 

{¥ + iA)2, 

in which the exponent 2 indicates the number of coins being tossed. Ex- 
panding this binomial gives 

+ \th + 

Therefore, if two perfect coins are thrown 1,200 times, we could expect to 
obtain (no heads) 300 times, th (a head and a tail) 600 times, and 
(two heads) 300 times. 

If three coins are tossed, we have the expression 
(iiJ + 

indicating that, if 1,200 throws were made, there should be no heads 150 
times, one head and two tails 450 times, two heads and one tail 450 times, 
and three heads 150 times. 


OCCURRENCES 



1024 ^ ^ 1024 ^ 1024 " ^1024 ^ 1024 '^ ' 

^ 1024 ^ 1024 ^ 1024 ^ 1024 ^ 1024 

j 1— /jio 

^ 1024 


Chart 115C. Expected Results of 10,000 Tosses 
of Ten Coins. (The probability of each combina- 
tion is indicated by the binomial expansion shown 
under each part of Chart 115.) 
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The results to be expected from tossing 4 coins is shown in section A of 
Chart 115, while the results to be expected from tossing 6 and 10 coins are 
shown respectively in parts B and C. All of these curves are symmetrical 
and, as the number of coins tossed becomes greater, the curve becomes 
smoother. When ten coins are tossed, there are eleven points to be plotted 
(See part C), but if 100 coins were tossed, there would be 101 points to 
plot and the curve would appear virtually the same as that of Chart 1 12. 
In fact, it can be shown that, as m becomes infinitely large, 
approaches as a limit the normal curve of error,® which is 



2(12 


The symbols are as follows: 

Yc — the computed height of an ordinate at the distance x from 
the arithmetic mean. 

N — the number of observations in the sample. 
i = the class interval. 

<T = the standard deviation of the sample distribution^ 

TT = the constant, 3.14159; \/27r = 2,5066. 
e = the constant, 2.71828, the base of the Naperian system of 
logarithms. 

^ X = Sb selected deviation from the arithmetic mean. 

Substituting the two constants mentioned above, we may write the 
equation 




Ni 

2.50660- 


2.71828^°''. 


Fitting the Nonnal Curve 

In Chart 113 a normal curve was shown fitted to a series of measurements 
of a line. It will be observed that those figures were repeated measure- 
ments of the same thing. In Chart 116 we have a different type of data, 


® See G. XJdny Yule and M. G. Kendall, An Introduction to the Theory of Btatistics. 
pp. 177-178, Charles Griffin and Company, London, 1937 (11th Edition); also D 
Caradog Jones, A First Course in Statistics^ pp. 180-184, G. Bell and Sons, London, 1921. 
The exponent of the binomial is usually denoted by n. We use w, however, since the 
s 3 nnbol n will be "used throughout later parts of the book to refer to ‘^degrees of freedom,” 
^ Strictly speaking, this should be the standard deviation of t he uni verse, wMch we 

do not know. We may make an estimate of this value (a) from \ , discussed in the 

iW— 1 

following chapter However, the difference between cr and a is negligible when V is 
large, as in the case of the illustrations in this chapter. 
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representing measurements of a number of individuals from a homogene- 
ous population. The chance errors involved in repeated measurements 
of the same thing can be expected to follow a normal curve. However, 
the measurements of a number of different individuals in respect to some 
characteristic do not necessarily follow such a curve. A distribution of 
the heights of a homogeneous group of adult individuals, for example, 
could be expected to be essentially normal, but a distribution of the weights 
of the same individuals would be noticeably skewed to the right. While 
the basal diameter of the egg-capsules of the snails in Chart 116 is well 
described by the fitted normal curve, it is quite likely that the weights of 
these same eggs would show definite skewness. 

NUMBER OF 

EGG -CAPSULES 



Chart 116. Kormal Ctir7e Fitted to Basal Diameters of 99 Egg-Capsules of a Marine 
Snail, Sipho curtus, (Data of basal diameters from Gnnnar Thorson, Studies on the 
Egg-Capsules and Development of Arctic Marine Prosobranchs, p 7, Meddelelser om 
Gr0nland -udgione af- Kommissionen for Videnskabelige Enders0gelser i Gr0nland.) 

The fitted curve in Chart 116 indicates the shape of the distribution 
we should expect if our sample were much larger, or if we had measured 
the entire population. It implies that, if a larger group were studied, we 
should find a few instances with basal diameters both smaller and larger 
than those found in the sample. 

Fitting the normal curve to data of physical ability. The data of Table 
55 show a distribution of the distances which 303 high school freshman 
girls were able to throw a baseball. It may be observed that very few 
of the girls threw the baseball less than 45 feet and very few threw it 115 
feet or farther. The column diagram of this distribution is shown in 
Chart U7. The distribution tends to be symmetrical, and we infer that a 
normal curve might reasonably be fitted. We shall, first, determine the 
values of a number of ordinates in order to ascertain the exact outline of 
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the fitted curve and, second, compute the theoretical frequencies to be 
expected in each class of the distribution. 


TABLE 55 

Baseball Theows for DibT\NCE by 303 First- 
Year High School Girls 


Distance m feet 

Number 

of 

girls 

Id but under 25 

1 

25 but under 35 

2 

35 but under 45 

1 7 

45 but under 55 

25 

55 iTut under 65 

33 

65 but under 75 

53 

75 but under 85 

64 

85 but under 95 

44 

95 but under 105 

31 

105 but under 115 

27 

115 but under 125 

11 

125 but under 135 

4 

135 but under 145 

1 

Total 

303 


Source* Leonora W Stewart and Helen West, The 
Froebel School, Gary, Indiana Measurements w’ere made 
in 1935. 


NUMBER 
OF GIRLS 



Chart 117. Normal Curve Fitted to Data of BasebaU Throws for Distance by First 
Year vugb Scbooi Girls. (Data fo:o',i)i Tables 55 and 56.1 
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Referring again to the formula for the normal curve 


Ya- 


rn 

2.50660- 


->3g 

2 . 71828 ^'^, 


:t appears that we need the values of Nj X, and a in order to fit a normal 
curve to a distribution. Computing by procedures described in preced- 
ing chapters, we find X = 80.63 feet and cr = 20.95 feet. (For comparison 
d = 20.98 feet.) As there were 303 girls, N = 303. 

We shall first compute the ordinate to be erected at the mean. This is 
designated as Yo and is the maximum ordinate of the fitted curve. Since 
0 ? = 0 at the mean, we have 


F. 


303 X 10 
2.5066 X 20.95 


-02 

2 . 71828 ^^^®^^^'. 


In the expression above, the exponent of 2.71828 is zero. Since a number 

-02 

raised to the zero power is 1, 2.71828^^^® = 1. It is apparent then 

that the expression is always equal to 1 for the ordinate erected at 
the mean (F^) and 


Fo- 


Ni 

<T\/27r 


Therefore 


AT'-j 

Yc = - F, 2.718282^\ 


For the problem in hand 

F. = 


orV27r 

303 X 10 


2.5066 X 20.95 


57.7. 


We now wish to erect enough additional ordinates on either side of Yo to 
enable us to sketch a reasonably smooth curve. If we select successive 
distances of 4.19 feet from the mean, we shall erect ordinates at steps of 
icr from the mean. The first pair of ordinates (since the curve is sym- 
metrical) are to be erected at a: = =i= 4.19 feet from the mean (X = 84.82 
and 76,44 feet), using the expression 

-(4 19)2 

Yc - 57.7 X 2.71828^^'"’^"^'. 

In order to determine the value Fc, it is not necessary to compute 

-C4 19)2 

2.71828^^^*^'^^^'^ but merely to refer to Appendix D. Looking up the ap- 
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X 4 19 

propnate value of which ia this case is = .20, we find that 
cf 20.95 


-( 419)2 


2.71828*®®®®' = .98020 


and 


Yc = 57.7 X .98020 = 56.6. 

For the next pair of ordinates, x = =±=8.38 feet (Z = 89.01 feet and 72.25 
feet) and 

-(8 38)g 


Yc = 57.7 X 2.71828*®“®®'. 


% 

Here the ratio of - is .40 and, referring to Appendix D, we have 
Yc == 57.7 X .92312 = 53.3. 


The process of determining the heights of the ordinates can be handled 
most expeditiously by use of a table similar to Table 56. The ordinates 
in the upper and lower parts of the table are identical since the fitted 
curve is symmetrical. 

The fitted curve is shown in Chart 117. It follows the general shape of 
the sample, but smooths out the irregularities and indicates what might 
be expected if the performance of a very large number of comparable girls 
could be recorded. What we have done so far gives merely the shape of 
the fitted curve and a visual impression of the suitability of the fit, which 
appears good in this instance. 

We have not yet undertaken to say what proportion of girls should be 
expected to throw a baseball 100 feet or more, or what proportion might 
be expected to throw a baseball 50 feet or less. Neither have we attempted 
to say what proportion of girls should theoretically be expected to fall into 
the various classes. The expected frequencies in each class are ascertained 
by integrating the fitted curve.^ However, the procedure is greatly sim- 
plified by making use of a table of the areas under the normal curve (Ap- 
pendix E). Referring to Table 57, we determine the expected frequencies 
in each class as follows: 

1. In column (1) of the table, enter the classes of the original distri- 
bution, allowing for one or two additional classes at each end, since the 
fitted curve should usually have a greater range than the sample. The- 
oretically the fitted curve is of unlimited range in both directions. Allow 
two spaces for the class in which the mean falls. 


s When there is a fairly large number of classes in the sample distribution, the the- 
oretical frequencies may be ascertained with reasonable accuracy by erecting ordinates 
at the mid-value of each class See P. E. Croxton and D. J Cowden, Practical Bminess 
Statistics pp 249-251, Prentice-Hall, Inc , New York, 1934. 



TABLE 56 

Detekmination of Ordinates op Normal Curve Fitted to Data op Baseball 
Throws for Distance by First-Year High School Girls 
[X = 80 63 feet; <r - 20 95 feet; F, - 57 7) 


z 

(in feet, where 
ordinates are 
to be erected) 

(1) 

X 

(m feet, 
deviation 
of X from X) 

(2) 

X 

(T 

(3) 

Proportionate 
height of 
ordinate 

2 7-1828^°^ 
(Appendix Dj 

(4) 

Height of 
ordinate 
[Col. 4 X Vo] 

(5) 

13 59 

-67 04 

3 20 

00598 

3 

17.78 

-62 85 

3 00 

01111 

d 

21 97 

-58 66 

2 80 

.01984 

1.1 

26 16 

-54.47 

2 60 

03405 

2.0 

30.35 

-50.28 

2 40 

.05614 

32 

34 54 

-46 09 

2.20 

.08892 

5.1 

38 73 

-4190 

2.00 

13534 

7.8 

42 92 

-37 71 

180 

.19790 

11.4 

47.11 

-33 52 

1.60 

.27804 

16.0 

51.30 

-29.33 

140 

.37531 

217 

55.49 

-25.14 

1.20 

.48675 

281 

59 68 

-20.95 

100 

.60653 

35.0 

63.87 

-16 76 

.80 

.72615 

41.9 

68,06 

-12 57 

.60 

.83527 

48 2 

72.25 

- 8.38 

.40 

.92312 

53.3 

76.44 

- 4.19 

.20 

98020 

56 6 

80.63 

0 

0 

1 00000 

57 7 

84.82 

+ 4.19 

20 

98020 

56 3 

89 01 

4- 8 38 

.40 

92312 

53.3 

93 20 

+12 57 , 

.60 

.83527 

48 2 

97 39 

+16 76 

.80 

72615 

41.9 

101 58 

+20.95 

1.00 

.60653 

35.0 

105.77 

+25.14 1 

1.20 

.48675 

28.1 

109 96 

+29.33 

1.40 

.37531 

21.7 

114.15 

+33.52 

160 

27804 i 

16 0 

118.34 

+37.71 

180 

.19790 i 

11.4 

122 53 

+41 90 

2 00 

.13534 ! 

7.8 

126 72 

+46.09 

2.20 

.08892 

5.1 

130,91 

+50 28 

2.40 

.05614 1 

3.3 

135 10 

+54.47 

2.60 

.03405 

2.0 

139.29 

+58.66 

2.80 

.01984 

1.1 

143.48 

+62 85 ! 

3.00 

.01111 

.6 

147 67 j 

+67 04 

3,20 

.00598 

.3 
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TABLE 57 

Determination op Expected Frequencies in Each Class for Baseball Throws for Distance by First- Year High School Girls 

(X = 80 63 feet; a = 20.95 feet) 


Expected 
frequencies 
in each class 

N = 303* 

(8) 

(NOiC<lT-H(MOcO OOOC<ll>rHCOi->j 

COOiOUiO 1>- C<ll>(MOc6tH 
(MC0»O lO iOCOCNt-H 

o 

CO 

Per cent of 
area m each 
class 

(7) 

r-iCOOb-QcOTHO O l>-THiO»r:>COCDai(N 
OOCOOOCOi01>- 05 rH(M(MCOC<ICOOp 
tHCOCOtHOOO 

10000 

Per cent of 
area between 
mean and limit 
(Appendix E) 
(6) 

O05THTH^'?^^C0'^^C<^C5O^0O(^0C5C0O 
pOiOiCOitOiOOOCOCDCO'^N' 0500*000050 
O05C35O5C0iOCi01>.OG0»ON-'tH00C5CiO5O 


«lb g 

i-HCOOCOOC<l>Ob-iH050'^C<JQl>iO 
.prHOTHl>C<ll>^C<J<N'OrHC0rHpO»O . 
•COCOfMC^rHrH ‘ rHrHCNfNCOOO * 


X 

deviation 
from mean 
to lumt 

(4) 

COCOCOCOCOCOOCCOl>l.'^l>l>l>»N*J>h^ 

COCDOOCOOCOCOCOCOCOCOCOCOCOOO 

t^CD*CTtiCO(Mr-i T-tcqcO'^*CCOI> 

1 

Limits of classes 

Upper 

limits 

(3) 

OOOOrHOlCOTtHlO* 
rH r-l rH 1>-H tH tH • 


Lower 

limits 

(2) 

lOiOiOiTDiOiOiOiO 
rH CO rt* lo CO b- 


Distance m feet 

(1) 

Under 5 

5 but under 15 

15 but xmder 25 

25 but imder 35 

35 but imder 45 

45 but imder 55 

65 but under 65 

65 but under 75 

75 but imder 85 | 

85 but under 95 

95 but under 105 

105 but imder 115 

115 but under 125 

125 but under 135 

135 but imder 145 

145 but under 155 

155 and over 

Total 


277 


* One decimal ib usually shown in this column in order that the t-otal of the expected frequencies will agree, to within 1 or 2, ’n ith the total of the observed frequen- 
es This is of importance in maMng the Table 61 



278 


FITTED FREQUENCY CURVES 


[Chap. 11 


2. In column (2), write the lower limits of each class below the mean 
•n value and the lower limit of the class which contains the mean. 

3. In column (3), write the upper limit of each class above the mean 
in value and the upper limit of the class which includes the mean. 

4. The process of determining the expected frequencies in each class 
uses the mean as the basis of reference and involves, first, those classes 
above (or below) the mean in value and, then, those below (or above). 
We shaU therefore ascertain first the expected frequencies between the 
mean (80.63 feet) and the upper limit (85 feet) of the class in which the 
mean falls. The deviation x of the upper limit from the mean is 4.37 



DISTANCE IN FEET 

Chart 118. Graphic Representation of Procedure in Colxmins (6) and (7) of Table 57. 


feet; this value is entered in column (4). Instead of integrating the curve 
from X = 80.63 feet to Z = 85 feet, we determine the value of -- and 

O' 

make use of Appendix E. Since cr = 20.95 feet, 

4,37 


X 

cr 


20.95 


. 21 . 


This value is entered in column (5). Now, looking up .21 in Appendix E, 
we find .0832, indicating that .0832 of the area of the normal curve (the 
total area being expressed as 1.00000) is between the mean and 85 feet. 
In other words, 8.32 per cent of the frequencies would be expected to fall 
between the mean and 85 feet. This value is entered in column (6). The 
procedure is shown graphically in Chart 118. 

6. The next step consists of determining the expected frequencies be-* 
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tween the mean and the upper limit of the first class above the mean 
This limit is 95 feet; x == 14.37 feet and 


a; ^ 14.37 
a 20.95 


.69. 


Looking up .69 in Appendix E shows that 25.49 per cent of the frequencies 
would be expected to occur between the mean and 95 feet. This value is 
entered in column (6). If 25.49 per cent of the items fall between 80.63 
and 95 feet, while 8.32 per cent of the items fall between 80.63 and 85 feet, 
there would be 25.49 8.32 = 17.17 per cent of the items between 85 and 

95 feet. The result of this subtraction is entered in column (7); this pro- 
cedure is also indicated graphically in Chart 118. 

6. The procedure in step 5 is repeated for each class above the mean in 
value. The expected frequencies from the mean to the upper limit of 
each class are ascertained, and then the frequencies from the mean to the 
upper limit of the preceding class are subtracted as shown in the table. 

7. The expected frequencies falling between the mean and the lower 
limits shown in column (2) of the table are next determined. Since these 
areas are also cumulative, successive subtraction is again necessary- 

8. We now have entered in column (7) the expected frequencies fox- 
each class except the class containing the mean. We have determined, 
in column (6), that there are 8.32 per cent of the expected frequencies from 
the mean to 85 feet and that there are 10.64 per cent of the expected fre- 
quencies from the mean to 75 feet. Adding these two figures gives 18.96 
per cent, the proportion of expected frequencies falling in this class [see 
colunm (7) and Chart 118]. 

9. The total of column (7) should be 100.00, as there are 50.00 per cent 
of the expected frequencies from the mean to either extreme of the distri- 
bution. In order to see the agreement between the observed and the ex- 
pected frequencies, we include column (8), which is obtained by multiply- 
ing 303 by the expected frequency of each class and dividing by 100, or, 
in a single operation, by multiplying 3.03 by the expected frequency of 
each class.® 

A comparison of the expected frequencies, shown in column (8) of Table 
57, with the observed frequencies of Table 55 reveals a general agreement 
of the figures, the difference being greatest for the class “85 but under 95 


» Ppr the class 75 but under 85 feet, 

18.96; 100.00 ::/c: 303 
100.00 /c “ 303 X 18 96 

fc = 18 96 = 3.03 X 18. 96 = 57 4. 

XOu.OO 
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feet.^’ A test of the “goodness of fit” of the normal curve, based upon 
the expected frequencies, will be described later. 

We have not yet answered the question: “What proportion of girls 
should be expected to throw a baseball 100 feet or more?” The propor- 
tion of expected frequencies from the mean (80.63 feet) to 100 feet {x ^ 
19.37 feet) is obtained by computing 


X 

a 


19.37 

20.95 


- .92, 


and referring to Appendix E. The proportion is .3212, or 32.12 per cent. 
Since 50 out of 100 girls would be expected to throw 80.63 feet or more, it 


NUMBER OF 
PLAYERS 



BATTING AVERAGE 

Chart 119. Normal Curve Fitted to Batting Averages of 379 Major and Minor 
League Baseball Players, 1936. (Included are only those regular players who were in 
76 or moie games and at bat 226 or more times. Source: David L. Rolbein.) 

follows that 50.00 — 32.12 = 17.88, or 17.9 per cent, would in general 
throw 100 feet or more. 

To determine the proportion expected to throw 50 feet or less, the pro- 
cedure is similar. 


a? = 80.63 feet 50 feet = 30.63 feet, 

^^ 30^.146 

<r 20.95 


Looking this up in Appendix E, we find .4279, or 42.79 per cent. Sub- 
tracting from 50.00 leaves 7.2 per cent as the proportion expected to 
throw 50 feet or less. 

The normal curve fitted to batting averages. David L. Rolbein has 
supplied the following interesting illustration of a distribution of a human 
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ability which appears to be adequately described by means of a normal 
curve (see Chart 119). The observed data of batting averages included 
all players in the American League, the National League, the American 
Association, the International Association, and the Pacific Coast League 
who had played in 75 or more games and who had been at bat 225 or more 
times. The data therefore included only regular players, V = 379. The 
fitting constants were X = .2942, while a = 1.67 classes, or .0334 units 
(the class intervals were .020). 

Granting unchanged conditions as to skill of batters, pitchers, and fielders 
and as to liveliness of the baseball, etc., what proportion of regular players 


TABLE 58 

Neck Circumperbn'ce of 231 
Male College Stijbents 


Mid-values 
(m inches) 

Number of 
students 

12.5 

4 

13 0 

19 

13 5 

30 

14.0 

63 

14 5 

66 

15 0 

29 

15 5 

18 

16 0 

1 

16 5 

1 

Total 

231 


Source* ConfidentiaL 


would be expected to bat .350 or better? Since X = .2942, x = .350 - 

V • r r 

.2942 = .0558, and - = = 1.67. From Appendix E, - == 1.67 gives 

or 0334 <T 

.4525 (or 45.25 per cent) as the proportion batting between .2942 and .350. 

Therefore 50.00 — 45,25 = 4.75 per cent would, in general, be expected 

to bat .350 or better. 

The normal curve and collar sizes. To illustrate this use of the normal 
curve, let us assume that a maker of collars is considering the production 
of a collar styled especially for college men. Consideration will, of course, 
be given to the number of collars of each size which should be made. 
Since college men represent a selected group, it would be desirable to 
adjust the manufacturing schedule to their particular requirements. Ex- 
tensive data on the circumference of the necks of college men are not avail- 
able, but in Table 58 are shown the neck measurements of 231 male college 
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students. To fit a normal curve, we need X = 14.232 inches and cr = .719 
inches. The column diagram of the observed data and the fitted curve 
are shown in Chart 120. 

Our problem, in this instance, is not to determine the expected propor- 
tion of college men having necks “12.75 but under 13.25” inches in circum- 
ference, “13.25 but under 13.75” inches in circumference, etc., but rather 
to determine the number of collars of each size (by half sizes) which should 
be made. Experience shows that, on the average, collars are worn about 
of an inch larger than the circumference of the neck. This means that 
collars size 14 would be worn by men whose necks averaged 13.25 inches 
and, since we are dealing with half sizes, the necks would range from 13 to 

NUMBER OF 
STUDENTS 



CIRCUMFERENCE OF NECK (INCHES) 

Cliart 120. Normal Curve Fitted to Keck Circumference of 231 Male College Students. 

(Based on data of Table 58.) 

13.5 inches in circumference. The first column of Table 59 lists the collar 
sizes, while the second column shows the corresponding neck cdrcumfer- 
ences. It is for these classes that we need to ascertain the theoretical 
frequencies. This is done in the remainder of the columns and the ex- 
pected frequencies (N == 1,000) are shown in column (9). If our basic 
data are representative, there would be about 270 customers in a thousand 
caUing for size 15 collars, 221 asking for size 14J, 213 requesting size 15J, 
etc. It is interesting to observe that we might expect only 8 out of a 
thousand of this group to ask for size 13 or smaller and but 7 out of a 
thousand to require 17 or larger. 

Suitability of the normal curve. As previously pointed out, the normal 
curve is only one of a number of kinds of curves which may be fitted to a 
frequency distribution. It should in no sense be thought of as a form 
having general applicability to all distributions. Since this is true, what 
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17.0 or larger * ' ' ’ 50.00 
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guides are there which will tell us when to fit a normal curve, or when 
fitted if it is suitable? 

1 . The plotted curve or column diagram of the sample distribution 
serves as a very crude guide. If there is skewness present, it wili be ap- 
parent, as will also any irregularities. 

2. The sample data may be cumulated and put into percentage form 
as in Table 60; these cumulative percentages may then be plotted on 
arithmetic probability paper^*^ as in Chart 121. If the resulting curve is 
approximately a straight line, we may proceed wdth assurance to fit a 
normal curve. 

3. The moments of the sample distribution may be computed as de- 
scribed in Chapter X. From these we may compute 


jdi = ^ (a measure of skewness), 
f^2 


182 = ^ (a measure of kurtosis), 

M2 

^ 1(^2 + 3)^ . aeneraJ 

" 4(4/32 - 3/3i)( 2;82 - 3(8i - 6) 

measure of departure from normal) 


TABLE 60 

Cumulative Distribution op Baseball Throws 
FOR Distance by 303 First-Year High 
School Girls 


Distance in feet 

Number 
of girls 

Per cent 
of total 

Less than 25 

1 

33 

Less than 35 

3 

99 

Less than 45 

10 

3.30 

Less than 55 ! 

35 

11.55 

Less than 65 

68 

22 44 

Less than 75 i 

121 

39.93 

Less than 85 

185 

61 06 

Less than 95 

229 

75.58 

Less than 105 

260 

85.81 

Less than 115 

287 

94.72 

Less than 125 

298 

98.35 

Less than 135 

302 

99.67 

Less than 145 

303 

100.00 


Soiirce: Cumulative data of Table 55. 


The vertical scale is so designed that the ogive of a normal curve will appear as a 
straight line. Paper available from the Codex Book Co., Norwood, Mass. 



15 35 • 55 75 95 115 135 

DISTANCE IN FEET 

Chart 121. Baseball Throws for Distance by 303 First-Year High School Girls, Show 
on Arithmetic Probability Paper. (Based on data of Table 60.) 
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If the distribution is essentially normal, we should obtain = 0, ^2 = 3, 
and K 2 = 0. For the throws of a baseball by high school freshman girls, 
we find jSi = .010415, ^2 = 2.772352, and k 2 = - 016062, indicating that 
a normal curve can probably be used to describe the series. Certain 
other values of / 3 i, /32, and k 2 indicate the suitability of other curves of 
the Pearson group.^^ 

4. After the curve has been fitted and the theoretical frequencies have 
been determined, the number of items in the sample are prorated among 
the classes upon this basis. For the baseball throws, N = 303, and the 
theoretical frequencies for a total of 303 are shown in the last column of 
Table 57. These theoretical frequencies and the observed frequencies are 
then compared by means of the t^st. 


X 


2 2 ~ 

fc 


where / is an observed frequency in a class and fc is the corresponding 
theoretical frequency. Table 61 shows the computation of x^ for the 
data of girls’ baseball throws. 

The goodness of fit is indicated by x^ when considered in conjunction 
with n, the number of degrees of freedom. The number of degrees of 
freedom is obtained by subtracting from the number of classes the number 
of degrees of freedom lost in the fitting process. In this case 3 degrees 
of freedom were lost because the original data and the fitted data were 
made to agree in respect to the number of items (V), the mean (X), and 
the standard deviation (cr). x^ ^ enable P to be determined, which 
tells us the probability that a fit as bad or worse might occur because of 
chance variations of sampling. 

On account of the great effect upon x^ of differences between small ob- 
served and expected frequencies at the ends of a distribution, it is generally 
necessary to combine tv/o or more classes at each end. Fisher suggests 
that no group should contain fewer than 5 expected frequencies. Com- 
bining the first three classes and the last two classes leaves 10 classes, as 


A somewhat more satisfactory test suggested by R A Fisher is given in Appendix 
B, section XI-1 For a discussion of the reliability of jdi and /Sg, see L H. C Tippett, 
The Methods of Statistics, p 86, Williams and Norgate, London, 1937 (2nd Edition). 

See Karl Pearson, Tables for StatisUc^ans and Biometncians, pp. lx, f, University 
Press, Cambridge, 1914; and W. P. Elderton, Frequency Curves and Correlaiion, (3rd 
Edition) Chs. IV and V, Cambridge University Press, Cambridge, England, 1938. 

^®See Chapter XII, p. 312, for a more complete statement concerning degrees of 
freedom. 

»8ee Appendix I for a table of P. The chi-square test is discussed in R. A. Fisher, 
Statistical Methods for Research Workers, Ch IV, Oliver and Boyd, Edinburgh, 1938 
(7th edition) and in L. H 0. Tippett, The Methods of Statistics^ Ch. IV, WiUiams and 
Norgate, London, 1937 (2nd Edition). 
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shown in Table 61; we have = b.39, = 7, and P is about .50. These 
results indicate that the noimal curve is a good description of the series, 
since, if the distribution of distances thrown is actually normal, we might 
expect a fit as bad or worse than this about 50 times out of a hundred, 
because of chance variations attributable to sampling. A rule of thumb 
is often undesirable because inflexible, but we may regard a P of less than 
05 as indicating a poor fit. 


TABLE 61 


Chi-Squabe Test of Goodness of Fit fob Normal Cxjeve Fitted to Bases ald 
Throws for Distance by First-Year High School Girls 


Distance m feet 

(1) 

/ 

observed 

frequency 

(2) 

fc 

expected 

frequency 

(3) 

f-fc 

(4) 

(/-/cP 

(5) 

<J-faY 

fc 

(6) 

15 but under 25 

1 

1 1 

1 



25 but under 35 

2 

32 

-3.4 

11.56 

.86 

35 but under 45 

7 

9 1 

J 



45 but under 55 

25 

20.2 

48 

23 04 

1.14 

55 but under 65 

33 

35.0 

-20 

4 00 


65 but under 75 

53 

50 6 

24 

5 76 

1 .11 

75 but under 85 

64 

57 4 

66 

43 56 

.76 

85 but under 95 

44 

62 0 

-80 

64.00 

1.23 

95 but under 105 

31 

37.0 

-6.0 

36.00 

.97 

105 but under 115 1 

27 

22 0 

5.0 

25 00 

1.14 

115 but under 125 

11 

10 2 

8 

.64 

.06 

125 but under 135 

4 

3.7 

1- 2 

.04 

.01 

135 but under 145 

1 

1.5 




Total . . 

303 

303.0 

1 

0 


6.39 


== 6.39; = 10 — 3 = 7. 


Binomials 

It was previously shown that the expansion of a S 3 mmietrical binomial 
(i + 2 )”^ can be approximated experimentally by tossing coins. An asymr 
metrical binomial may be expanded experimentally in a similar fashion. 

Experimental construction of skewed binomials. Let us consider, first, 
a single die two sides of which are colored black. If we toss this die, it 
is apparent that the probability (p) of having a black side come up is 1 
out of 3, or I-, while the probability (g = 1 — p) of obtaining a w'hite side 
is 2 out of 3, or f. We may express the situation as gw + p6 or %v) + 
which indicates that, if the die (assumed to be perfectly balanced) is tossed 
1,500 times, we should expect a white side to appear 1,000 times and a 
black side 500 times* 
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If, now, we toss two dice (each having two black sides), there may 
appear either no black faces (2 white faces), a black face and a white face, 
or 2 black faces. The expression is 

(iw + ibf = + iwb + ib^. 

Therefore, if 1,800 throws are made, we should expect to obtain no black 
faces (two white faces) 800 times, a black face and a white face 800 times, 
and two black faces 200 times. 


OCCURRENCES IN THOUSANDS 



Ciiart 122. Expected Restilts of 59,049 Tlirows of 10 Dice, Each Having Four Wliite 

/2 1 

Sides and Two Black Sides. The expected occurrences are given by t 


^ 59 , 049 ^"^ ^ 59 , 049 ^"^ ^ 59 , 049 ^"^ ^ 59 , 049 ^"^ ^ 59 , 049 ^^ 


59,049 

3,360 

59,049^ 


59,049 


59,049 


59,049 


59,049 


If three such dice are thrown, the expression is 

(^w + + -^wh^ + ^b^. 

It wiU be observed that the binomial is beginning to show its skewed 
nature. This will be more clearly seen if we consider throwing ten dice, 
each with two black sides. The expression is (§w + -^b)^^, which is shown 
graphically in Chart 122. The curve is definitely skewed as a result of 
the fact that the two fractions q and p are unequal. 

If ^ is a larger fraction and p is smaller, the skewness will be even greater. 
Let us consider as an illustration a four-sided pyramidal die with one 
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black side and three white sides. It will be necessary 
to consider the ^^down^^ side as the one obtained at a 
throw. For throwing one die, the probability is fit? -j- jh. 

If 10 of these four-sided dice are thrown, their behav- 
ior is indicated by (fu? + The expansion of this Each^SMe^of 

binomial is shown in Chart 123, which is noticeably more Which Is an Equi- 
skewed than the curve of Chart 122. lateral Triangle. 

Fitting a binomial. It is apparent from the expression for a binomial 
that it is a device most useful for fitting to discrete data. In order to 
fit a binomial to a series of observed data, the following three steps are 



OCCURRENCES IN THOUSANDS 



Chart 123, Expected Results of 1,048,576 Throws of 10 Four-Sided Dice, Each 
Having Three White Sides and One Black Side. The expected occurrences are given by 
/3 , 59,049 196,83 0 , , , 295,24 5 , , ^ , 262,440 . 

( 4 ^ + 47 1,048,576^ 1,048,576^^ 1,048,576^ 


1,048,576 

. 153,090 _ „ . 61,236 . 

I 4 - -f- 

^ 1,048,576 ^ 1,048,576 ^ 

30 1 


17,010 

1,048,576' 




3,240 


1,048,576 


1,048,576 




1,048,676 




1,048,576 


1,048,576 


necessary: (1) Determine the proper value of p, which also gives us 
since, g = 1 — p. The size of p determines the degree of skewness of the 
curve. If p = .50, then q = .50 and the curve is symmetrical The 
farther removed p is from .50, in either direction, the greater the skewness. 
If p < .50, the curve is positively skewed; if p >.50, it is negatively 
skewed. (2) Expand the binomial (q -I- p)”^, where m — the number of 
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categories minus one, since there are m + 1 terms in the expanded bino- 
mial. (3) Multiply the total frequencies N by each of the fractions of 
the expanded binomial. 

Table 62 shows a distribution of the number of male pigs occurring in 
litters of five pigs. The arithmetic mean of the series is computed in the 
usual manner and X is found to be 2.4397, the mean number of males per 
litter of five pigs. Since there are five pigs in a litter, the probability of 
any given pig (in a litter of five) being born a male is 2.4397 -i- 5 = .4879. 
In si'milar fashi on, the Value of p for any such observed series is obtained 

from v - — N = 116, the number of litters, not 580, the number of pigs. 
^ m 

TABLE 62 

' Number of Male Pigs Born in Litters op Five and 
Determination op X 


Number 
of males 

X 

Number of 
litters having 
! specified 

number of males 
/ 

fX 

0 

2 


1 

20 

20 

2 

41 

82 

3 

35 1 

105 

4 

14 ! 

56 

5 

4 1 

20 

Total 

116 

283 


Source A S Parkes, “Studies on the Sex-Ratio and Rrlr tr d Phe- 
nomena The Frequencies of Sex Combinations .n J ” 
Biometriha, Vol 15 (1923), pp 373-381 Parkes hts a binomial to 
the same series using p = 4876, as deto’*m’'od for litters of 4 to 12 
pigs. His expected frequencies are ’de''ricai uiin ours 


As pointed out above, the fitting is accomplished by expanding 
N{q -j- p)”. Substituting 5 for m, but retaining the other symbols 

N{q pY = N((f + 5Yp -b lOg^p^ _j_ iQqZpS 5gp4 _|_ 

where the exponent of p indicates the number of males born in a litter of 5. 

The numerical expression to use in fitting the binomial is (.5121 + 
.4879)®, and since iV = 116 we should expand 116(.5121 -|- .4879)®. This 
becomes 


116[(.6121)® -f 5(.5121)4 (.4879) + 10(.5121)® (.4879)2 

-b 10(.5121)2 (.4879)3 -b 5(.5121) (.4879)^ + (.4879)®]. 
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The computations are most conveniently carried out by means of loga- 
rithms, as shown in Table 63. Although the powers could be obtained 
and the multiplications could be performed for this problem by the use 
of a calculating machine, the use of logarithms is essential when a binomial 
is raised to an appreciably higher power. 

Chart 124 shows the observed and the expected frequencies. The ob- 
served data have been shown by means of separated bars to suggest the 
discrete nature of the series. If the first two and the last two classes are 
combined, = *72. Since there are now four classes and since two de- 
grees of freedom were lost (the number of litters N and p were used in 

NUMBER 
OF LITTERS 



NUMBER OF MALES 

Chart 124. Binomial Fitted to Distribution of Number of Male Pigs Bom in Litteii 
of Five. (Data from Tables 62 and 63 ) 

fitting), n — 2. The value of P is about .70, indicatmg a good agreement 
of the observed with the expected frequencies. 

It should not be assumed that all discrete series may be fitted by the 
method just explained. Some data require other types of series, as, for 
example, the Poisson series, the fitting of which is described by Tippetit^^ 
and others. 


L H. C. Tippett, The Methods of Statistics, pp. 48-54, Williams and Norgaie, Lon- 
don, 1937 (2nd Edition). See also R A. Fisher, Statistical Methods for Research Workers, 
pp. 56“59, Oliver and Boyd, Edinburgh, 1938 (7th edition); and H. L. Rietz (editor), 
Handbook of Mathematical Statistics, Ch. VI, Houghton Mifflin, Boston, 1924. 
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Skewed Curves 

The binomials just discussed are suitable for fitting to discrete data, 
but are not accurate enough to use with continuous data. A fitted bino- 
mial consists of a series of ordinates erected at specific points on the X-axis 
(see Chart 124). If this procedure were applied to a distribution of con- 
tinuous data (or to discrete data where the X units are small in relation to 
the class interval), we should be erecting ordinates at the mid- value of 
each class, instead of determining the area under a smooth curve. Obvi- 
ously, the greater the number of classes, the less would be the difference 
between these two procedures. 

There are a great many types of skewed curves which may be fitted to 
frequency distributions. It is the purpose of this volume, not to enter 

NUMBER 

OF HOMES 



Chart 125. Logarithmic Normal Curve Fitted to Kilowatt Hours of Electricity Used 
per Month in 282 Medium-Class Homes in an Eastern City, (Based on data of 
Table 64.) 

into an extended consideration of this topic, but merely to sketch briefly 
the procedure involved in fitting two of the simpler types.^^ 

The logarithmic normal curve. Some distributions which are skewed 
to the right become symmetrical when plotted in terms of the logarithms 
of their X values or, alternately, when plotted on graph paper having a 
logarithmic X-scale. The column diagram of Chart 125 shows the monthly 
use of electricity by 282 medium class homes in an eastern city, drawn 
from the data of Table 64. It is apparent that the series is decidedly 
skewed in a positive direction. In Chart 126 these data have been re- 


For a more detailed discussion, see: W. P. Elderton, Frequency Curves and Correkh 
iioUf Cambridge University Press, Cambridge, England, 1938 (3rd Edition) ; H. L. Rietz, 
Mathematical Statistics, Open Court Publishing Co., Chicago, 1927; Arne Fisher, Mathe- 
matical Theory of Probabilities, Macmillan, New York, 1922 (2nd Edition). 
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plotted but against a logarithmic X-scale. When the curve is extended 
to F = 0 at Z = 6 kilowatt hours (the first class below the first one 
shown in the table), the approximate symmetrical nature of the series 


NUMBER 
OF HOMES 



KILOWATT HOURS 

Chart 126. Kilowatt Hours of Electricity Used per Month in 282 Medium-Class 
Homes in an Eastern City. Logarithmic Z-scaie. (Data of Table 64 Frequencies 
are plotted at logarithmic mid-values of classes.) 

TABLE 64 

Kilowatt Hours op Electricity Used 
PER Month in Medium-Class Homes 
IN AN Eastern City 


Kilowatt hours 
(mid-values) 

Number 
of homes 

10 

25 

14 i 

50 

18 

53 

22 

48 

26 

36 

30 

26 

34 

19 

38 

8 

42 

6 

46 

3 

50 

4 

54 

2 

58 

2 

Total 

282 


Source: Electrical Testing Laboratories, New 
York City Name of city withliekl by request. 
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in terms of logarithmic X values is apparent. A further indication of 
this is shown in Chart 127, which presents the cumulative percentage fre- 
quencies plotted on logarithmic probability paper. 

Fitting a logarithmic normal curve. The procedure for fitting a loga 

PER CENT 
OF HOMES 



10 20 30 40 60 80 100 


KILOWATT HOURS 

Chart 127. Kilowatt Hours of Electricity Used per Mouth iu 282 Medium-Class 
Homes in au Eastern City. Shown on logarithmic probability paper. (Based on data 
of Table 64 ) 


Available from Codex Book Company, Norwood, Mass. 
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rithmic normal curve has been explained by Davies^® and is essentially the 
same process as that of fitting a normal curve, save that we use the arith- 
metic mean Xiog and the standard deviation criog of the logarithms of 
the X values. The values of Xjog and criog may be computed by making 
use of the mid-values of the logarithms of the class limits. Ideally the 
classes should be so chosen that the class intervals are equal in a logarithmic 
sense, thus making the logarithmic mid-values equidistant from each other. 
Usually we are deahng with ready-formed frequency distributions of arith- 
metically equal class intervals, and with such distributions the direct com- 
putation of Xiog and criog is laborious. The inconvenience of computing 
these logarithmic values has been eliminated by Davies, who gives formulae 
based upon the quartiles which are readily computed. Furthermore, ac- 
cording to Davies, there are certain advantages to the procedure. He 
says: ^‘Unless the data are very regular, these [Xiogand criog] i^ay be more 
satisfactorily computed from the quartiles, thus avoiding the disturbing 
effects of irregular extreme items. The expressions are given below. 

Y - iQg Qg + 1.2554 log Q 2 

3.2554 

This is the weighted average of the three quartiles, the weights being 
proportional to the heights of normal curve ordinates erected at these 
values. 

criog = ‘7413 (log Qz - log Qi). 

This expression grows out of the fact that in a normal curve 50 per cent 
of the items are included within of the median (or mean), and also 
that 50 per cent of the items are included within =*= .6745cr of the mean. 
It is therefore obvious that 


Since 


.6745 


Qd — Qi 


Q = 1.4825Q. 




it follows that 

Qs “ Qi - 2Q, and cr - .7413(^3 - Qi). 

For the data of electric consumption, Qi = 15.6400 kwh., Q 2 (the 
median) « 21.0833 kwh., and Qz == 27.9444 kwh. 


Y - log 15.6400 -h log 27 .9444 -f 1.2554 log 21,0833 


IS G. R. Davies and W. F. Crowder, Methods of StaUstical Analysis, pp. 303-306; 
and G- R. Davies, ^^The Analysis of Frequency Distributions/^ Journal of the American 
Statistical issociat'*^ Vol 24, December 1929, pp. 349-366 
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1.194237 + 1.446295 + 1.2554(1.323939) 
3.2554 


4.302605 

3.2554 


1.321682. 


cr,„g = .7413 (log 27.9444 - log 15.6400) 
= .7413(1.446295 - 1.194237) 


= .7413 (.2520581 


= .186851 


Using these two values, we may determine the expected frequencies in 
each class in a manner strictly parallel to that used previously for the 
normal curve and by using the same table of areas (Appendix E). Table 
65 indicates the procedure. The expected frequencies and the observed 
frequencies are in close agreement. Note also the correspondence of the 
column diagram of the original data and the fitted cimve in Chart 125. 
The ordinates are computed from the expression^® 


Yc 


.4343iVf 

A” Clog 


is 


2 

log • 


Since V2 t = 2.5066, the expression may be simplified for purposes of 
computation to 

—a;® 

Yc = -^5- e • 


X is the arithmetic value of the point on the X-axis at which the ordinate 


It will be recalled that the expression for the normal curve is 



For fitting the logarithmic normal curve, the expression cannot be used m this form 
since (T is in terms of logarithms (triog), while the class intervals i are equal anthmetic- 

logio & -4343 

ally We therefore multiply i by the adjustment factor — or to compensate 
for the fact that the logarithms of the intervals are not equal. We thus have 
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is to be erected. The values of 6^*^*°* are obtained from Appendix 
and the values are given by 

^ log 


3^1og 
O' log 


logY -Yiog 

<^log 


Davies suggests a logarithmic coefficient of skewness 

c,, _ log Qi + log Qz - 2 log Q 2 

logQz-logQi 

and points out that a series which yields a coefficient of less than 0.15 
(or perhaps even 0.20) may tentatively be considered as logarithmically 
normal. If, however, a skewed distribution is not inherently logarithmic, 
Davies notes that it may sometimes be adjusted by shifting the X values 
until the desired skewness is obtained; after fitting, the X values are again 
shifted. This correction c is obtained by 

= Q 2 ^ QiQs 
+ Qs ~ 2Q2 

This value is added to the class limits and to the quartiles, after which 
Yiog and Clog are computed. The fitting proceeds as in Table 65, but 
the shifted class limits are used. After the expected frequencies have 
been ascertained, the class limits are shifted back to their original values. 
It is obvious that this device extends the usefulness of the logarithmic 
normal curve. 

Fitting a normal curve with adjustment for skewness. The formulae 
previously given for the normal curve enabled us to fit a symmetrical 
curve from a knowledge of X, cr, and N. We have just considered one 
method of fitting a skewed curve. Another procedure that is useful for 
certain skewed distributions consists of using also a measure of skewness 

Us ^3 

^ (or if Sheppard^s correction is notappHed) 


0:3 




■2 

and thereby making a correction to the fit of a normal curve. This is 
sometimes referred to as a second approximation curve. The equation ^ 
is 


oV^ 


^20-2 


f Xi ^20-3 0 

3 

\(T-\/2t L ^ 

2 \(r 3(rVjj 


20 The expression includes the first two terms of the Gram-Charlier series. For a 
turther discussion, see W. A. Shewhart: Economic Control of QtuMty of Manufactured 
Product, pp 84-94, D. Van Nostrand, New York, 1931. 
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The expression preceding the minus sign is that for the normal curve, 
while the expression in braces represents a modification for skewness. In 
order to determine the expected frequencies, the above equation must be 
integrated. This is accomplished by the use of tables. To use these 
tables, we write 

jj(x)dx = 

TABLE 66 

Computation of X, tr, and 0:3 for Depth of Sapwood 


Depth in 
inches 
(mid-values) 

/ 

d 

fd' 

f(d'F 

Rd'F 

10 

2 

-7 

- 14 

98 

- 686 

13 

29 

-6 

-174 

1,044 

-6,264 

16 

62 

-5 

-310 

1,550 

-7,750 

19 

106 

-4 

-424 

1,696 

-6,784 

22 

153 

-3 

-459 

1,377 

-4,131 

25 

186 

-2 

-372 

744 

-1,488 

28 

193 

-1 

-193 

193 

- 193 

31 

188 

0 

0 

0 

0 

34 

151 

1 

151 

151 

151 

37 

123 

2 

246 

492 

984 

40 

82 

3 

246 

738 

2,214 

43 

48 

4 

192 

768 

i 3,072 

4 6 

27 

5 

135 

675 

3,375 

4.9 

14 

6 

84 

504 

3,024 

52 

5 

7 

35 

245 

1,715 

5 5 

1 

8 

8 

64 

512 

Total 

1,370 


-849 

10,339 

-12,249 


Source: Data from W A Shewhart, Economic Control *j of Manufactured Product, p 77, D Va# 

Nostrand Co., New York, 1931 Courtesy of D Van Nt'^ir.* rd (,a> Lac. 


- 


V2 ^ 


Vz = 


N 

md'F 

N 

N 


-.619708. 


= 7 546716. 

= -8.940876. 


J - 3.1 - [(.619708)(.3)] = 2.9141 inches. 
Since Sheppard’s correction is not applied, we have 
TTg = - 7,162677, 

TTs =» P 3 — 3fiP 2 2pf = 4.613422. 
or == iV ^2 — .8029 inches. 
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where " 


represents the areas of the normal curve (given in A| ipendh 


E) and asF 2 (^ represents the modification for skewness. Values of F 2 

are obtained from Appendix J .and are then multiplied by 0 : 3 . 

As an illustration of this method of fitting, we use the data of Table 66, 
which are shown graphically in Chart 128. The fitting procedure^^ for a 
second approximation curve is shown in Table 67. The values of N,X,o, 
and 0:3 having been obtained (Table 66), the steps are as follows: 


x\ 

Ui 


1. Make entries in columns (1) to (6) inclusive, as was done in fitting 
a normal curve. 


NUMBER 



Chart 128. Second Approximation Curve Fitted to Depth of Sapwood. (Based on 

data of Table 66 ) 


2. Refer to Appendix J and enter in column (7) the ^ 2 ^^^ values asso- 

ciated with each - value of column (5). Negative signs are entered in this 
column for the percentages associated with class limits of column (2). 


Sheppard^s correction has not been apphed in the computation of the second mo 
ment, partly because the distribution is skewed. Furthermore, Shewhart points out 
(op. city p. 78) that the corrected standard deviation (.798211) differs more from the 
standard deviation of the ungrouped data ( 802555) than does the uncorreeted standard 
deviation (.802895). When high contact is not present, overcorrection of a moment is 
not musual. It arises because the corrections allow for non-existent classes at the 
extremes. 




TABLE 67 

Fit of Second Approximation Curve to Data op Depth of Sapwood 
(X — 2.9141 inches; o' = 8029 inches; as = + 2407) 


Theoretical 
frequencies 
N - 1,370 

(11) 

CM Oi lO lO Cft 00 00 ^ CM oa 1> W ^ iO CO CD CM 

CM lO 05 00 O 05 lO »H <M rH 

tH tH CM rH tH tH 

Theoretical 
frequencies 
as percentages 

N = 100 per cent 
(10) 

CMOOr'‘-OfOCOt^CO CO 

o tH O 00 O CM b- J> 05 o CD *Oi CD CO 00 05 P o O 

' O CO tH 00 JtO CO T-i 

H 1—1 T— ( rH T— ( 

as percentages 
[Col 6 - Col 8] 

(9) 

CMO(Mi001>^l>^05CMCMCOTjHCOpCO»OCMt-g50 
p(:DThH>*O500CDO0i-<l>00'^O5^O51>.l>r-ICOCOCO'?r 
rHrHt-icSoO'^b-CDCOr-HLON'^i— I'^CDr-QOOOCOOOOO 
lO >0 >D lo TjH CO CM rH CM CO ^ 

as 

percentages 

(8) 

1> CD CO ^ 1-1 CM r* lo 05 CD 05b- 00 O 00 CO CM rH o 

CD 1> P rH P CM 1> 05CM CO 1 -t 00 CM CM tH oq b- CD CD CD O 
tHt— I rHCMCMCMr- 1 t— ItHCMCMCMi— < r— IrH 

M 1 M I 1 { 1 

o; 

bfi 

t 

CMCMCMC000rHrHCMC0CMCM'^l>C005rHeMOCDCM^^ 

05C0O05i0CMCMOppCD00 00'<!iH'«eHt^0qCM00t>CDCD 

CD 1> 00 00 05 05 i> TjH tH * t-H Is. 05 05 00 J> CD CD CD CD 

M 1 1 1 I 1 1 i 

xn 

^ S) 

Is- 

<u 

ft 

JJOrtl05005l00005C5i-HOOCMi-<iH05rHOOI>-0500 
05 00''^CDt0C0a5 05 00I>CMiiOC000CM00CD00p05OO 
05 05 C500CDCM»DidcMT-(COoOr»C01>000505050500 
XjH'^'itl^'i'tl'^COCMrH rHCMCOTJ^Ti^Tj^T^^■ri^TH''^>OtO 

Hlb g 

C50rH'-Ht-TiHOCDC0O5»000CMC0C5C0r'OT^l00THtD00 
i-lTHr-05CM»Dt>-OCM’^rH(35CDCOTHOOCDeoOOO*OCM 
0005^01— IOq'^01>COO'i!tH>.rHl005CMCDO'^l>rHlO 
COCMCMCMTHi-Hif-l ’tHi-HtHCMCMCOCOCO''^'^ 

if) 

X 

r-IrHi— <T— I tHt— ( r-lrHrH05C35 05 05 05 05 05 05 05 C5 05 05 05 

COCOCDCDCDCDCDCDCDCOCCCOCOCOCOCOCOCOCOCOCOCO 
pCOOI>'^iii-^XiOCMOCOcD05CMiOOO^Td^i>OCOCD 
CMCMCMrMirHrH i-HiHr-tCMCMCMCOCOCO 

Limits of classes 

Upper 

hnuts 

(3) 

tQ ID lO ID > LO LO ».0 I" ^0 »0 

CJ5CMIOOC — — I'-S'OCDO: MLO 
CMCOCO(^^'C^^•'!^^Tt^^OlOlQ^CCDCD 

Lower 

limits 

(2) 

iDtOuOiO^OiOlOiOiO 

CMiOOqrHTt^t>OCOcD 
r-K r4 tH CM CM CM 

Depth in 
inches 
rmid-values) 
(1) 

■Tj^t>OC0CD05CMi-0 OC — '-“t>.O?0CDC:M>0X — -r- 
— <1— i-'—CMCM CM COrCCO^'r-rrTT'i.Ti-Oi'rcDCD 
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The valu^ of may be conveniently read from the table of ordinates of the normal curve {Appendix D), or from a more extensive table 
In Karl Pearson, Tables for Siahshctans and Biometncians, pp 2-8, University Press, Cambridge (England), 1914. The values for z shown 
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3. In column ( 8 ), multiply each value of column ( 7 ) by 0 : 3 . Signs 
are shown. 

^ 4. To produce column (9), the values in column ( 8 ) are subtracted 

algebraically from the values in column ( 6 ). 

5. The cumulative areas or frequencies of column ( 9 ) are decumulated 
in column ( 10 ), as was done for the normal curve. The result is a series 
of figures showing expected frequencies on the basis of the second approxi- 
mation for N = 100 per cent. One of the shortcomings of this curve is 
that it may occasionally produce negative frequencies at one end, or, if 
we do not extend the fit far enough to produce these negative frequencies, 
the total may slightly exceed 100 per cent. In this instance column (10) 
totals 100 . 02 . 

6 . In column (11) the frequencies are prorated among the classes so 
that the total equals the N of the sample. 
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CHAPTER XII 

RELIABILITY AND SIGNIFICANCE OF 
STATISTICAL MEASURES 

ARITHMETIC MEANS 


In the previous chapter it was observed that errors of measurement tend 
to follow the normal law The chance occurrences growing out of tossing 
coins and from partitioning sand by means of the apparatus of Chart 114 
likewise show a defimte tenaency to assume the normal shape. In this 
and the following chapter we are interested in a study of means and other 
computed values, obtained from samples of a larger body of data We 
shall want to know how much reliance we can place in a statistical measure 
computed from a sample, and we shall make use of the concept of the 
normal curve to begin our attack upon this problem. 


Reliability of Sample Means, Large Samples 


If we have a large body of data — ^for example, figures for thousands of 
automobile tires (of the same size, quahty, and make, and used on similar 
vehicles) showing the distance run by each tire — ^we may study the entir e 
set of data or we may study one or more samples drawn at random from 
the data. If we select a sample of (say) 200 items, we shall find that the 
sample will furnish us much useful information. Tor instance, the mean 
from the sample will piobably not be greatly different from the mean of 
all of the data. Furthermore, the larger the size of the sample (and the 
smaller the variabihty in the basic data), the greater the likelihood that 
our sample mean will agree closely with the population mean. If we selected 
an additional sample of 200 items, we should not necessarily get the same 
value for the mean of this sample as for the first. It would he possible 
to select 1,000 samples of 200 items each and then, from the 1,000 means 
thus obtained, we could determine a standard deviation of the sample 
means, which would tell us the amount of variability present in our sample 
means. Thus we might write 




V 


(Xi- Xp)^ + (X2 - + (X, - Xp)^ . 

k 
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where cr^ is the standard deviation of the sample means; 

Xif X2, etc., are the means of successive samples; 

Xp is the mean of all the items in the population (that is, the popu- 
lation mean); 

k is the number of means considered (or the number of samples 
drawn). 

It would be very unusual to have such an exhaustive collection of sample 
data as indicated above. Generally there is available but one or a few 
samples of a larger group, commonly referred to as the “population’’ or 
'^universe.” It so happens, however, that the means of samples drawn 

PER CENT 

OF TOTAL 



WAGES IN DOLLARS 

Chart 129. Distribution of Means of 100 Random Samples of 10 Items Each and 
of Population of 972 Wage Earners^ WeeMy Earnings. Even though each sample con- 
sisted of but 10 items, the approximate S 3 anmetry of the curve of sample means is 
apparent. 

at random from a normal population tend to form a normal curve around 
the population mean, and for that reason it is easily possible to infer, from 
limited information, the behavior of arithmetic means computed from such 
samples. If the distribution of the population is not exactly normal, the 
distribution of means of random samples tends to normality as the size of 
the sample is increased. (The behavior of measures computed from small 
samples wiU be discussed in the latter part of this chapter.) 
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Chart 129 shows the results of drawing a number of samples, each of 10 
items, from a larger population. The solid curve shows the original dis- 
tribution which is definitely skewed. The curve of the means, however, is 
nearly symmetrical. Notice the more limited spread of the curve of the 
sample means. If the size of the 100 samples had been larger, the spread 
of the curve of sample means would have been less.^ 

The standard error of a sample mean. Suppose that, for the thousands 
of tires referred to previously, the mean mileage is 15,200 and the standard 
deviation is 1,248 miles. Suppose, further, that a sample of 900 tires 
shows X = 15,223 miles and cr = 1,230 miles. We are interested in know- 
ing, first, how much variability should be expected in means from samples 
of this size and, second, whether or not 15,223 miles represents a significant 
divergence from the population mean. The variability may be obtained 
from^ 


<r^ == 


dp 

Vn' 


where is the standard error of means drawn from samples (the value 
which would be obtained if we should compute the stand- 
ard deviation of the means of all possible samples of N items) ; 
ap is the standard deviation of the entire population; 

N is the number of items in the sample. 


For the above data 


- 12^8 
V900 


= 41.6 miles. 


\To select a random sample experimentally, we may record all of the original (popu- 
lation) data on small cardboard discs and place them in a container, then mix the discs 
thoroughly, select one disc, record the entry, replace the disc, mix again, and repeat. 

2 For a derivation of this expression, see Appendix B, section XI1~1 As is apparent 
from that development, our methods assume that the population is infinite. In Fisher^s 
words, “the values or sets of values before us are interpreted as a random sample of a 
hypothetical infinite population of such values as might have arisen in the same cir- 
cumstances.’" (R. A. Fisher, Statistical Methods for Research Workers, p 7, Oliver and 
Boyd Edinburgh, 1938, 7th Edition). If the population is finite, but very large, the 


expression 


adequate. 


However, if the population is finite and the number 


in the sample N is not negligible m relation to the size of the population P then, as is 
shown in Appendix B, section XII-1, the expression becomes 



V: 


'P -N 


Sample me(Kans are less reliable than sample means; o-mea - 1.2533<rj if the popu- 
lation is normal. For a discussion of the reliability of the median, quartiles, and 
neroentiles see G. Udny Yule and M. G. Kendall, An Introduction to the Theory oj 
Statistics, pp. 380-385. Charles GrifiSn and Co., London, 1937 (11th Edition). 
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The interpretation of this measure is analogous to that of the standard 
deviation. If additional samples of the same size are drawn at random 
from this population, we should expect 68.27 per cent of the means to fall 
within =^41.6 miles of the population mean, that is, within the range of 
15,241.6 and 15,158.4 miles; we should expect 95.45 per cent to fall within 
a range of 15,200 =*= 83.2 miles (population mean =*=2(7^) ; we should expect 
99.73 per cent or nearly all to fall within 15,200 =i= 124,8 miles (population 
mean =^3a>). For convenience we occasionally use a measure known as 
the probable error of the mean (PE^); this is^ .6745(r^. Plus and minus 
one PE^ of the population mean indicates the range within which 50 per 
cent of the sample means would be expected to fall. Thus 50 per cent of 
the sample means would be expected to faU within 15,200 28.1 miles. 

It will be noticed that the sample mean shown above is well within the 
50 per cent range. 

Significance of the deviation of a sample mean from the mean of a 
known population. The figures just given tell us what variation may be 
expected in sample means, because of the operation of chance in the draw- 
ing of random samples. We know that 68.27 per cent of the sample 
means would be expected to fall between 15,241.6 and 15,158 4 miles. 
The mean which we obtained from the sample of 900 cases was 15,223 
miles. This differs +23 miles from the population mean. What is the 
probability of obtaining a sample mean differing by +23 miles or more 
from the population mean? Chart 130 shows graphically that the proba- 
bility of getting a sample mean which differs by +23 miles or more from 
the population mean is very large. We may put this in numerical terms 
by determining the area of the cross-hatched section shown on the curve. 
We take the distance on the horizontal scale from the population mean to 
the observed mean as x; thus 

X = J - Zp = 15,223 - 15,200 = +23 miles. 

If we express this deviation in terms of cr^, we may ascertain the area of 
the white section A of Chart 130 from Appendix E. Thus 

a; ^ Z -- Zp ^ 23 _ 

<r <rx 41.6 

and from the appendix we find that 20.88 per cent of the total area is 
included between 15,223 and 15,200 on the horizontal scale. Subtracting 
this from 50 per cent (half the curve lies above the mean), we have 29.12 
per cent, indicating that in 29 cases out of 100 we might expect a sample 


^ See Appendix E. Twenty-five per cent of the area of a normal curve is included 
between an ordinate erected at the mean and an ordinate erected at a distance .6745<t 
on the horizontal axis from the mean. 
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mean to exceed the population mean by 23 miles or more. Sometimes we 
wish to ascertain the probability that a sample mean might either exceed 
or fall below the population mean by a given amount. Suppose we wish 
to ascertain the chances that a sample mean might either exceed or fall 
below the population mean by 23 miles. This is 29 + 29 = 58 chances 
out of 100 (graphically, the addition of the cross-hatched and stippled 
parts of Chart 130), and it is apparent that the chance variations of sam- 
pling may have caused the variation of 23 miles. This difference there- 



-J2<e 

I5P752 


-832 -416 +416 +832 -^248 

IS1II&8 15,1584 1^200 15,2416 15,2822 153248 

VALUES OF SAMPLE MEANS 


Chart 130. Expected Distribution of Sample Means of Tire Mileage and Chances 
of Obtaining Sample Means DiJffermg from the Population Mean by -{-23 and —23 
Miles, When Xp = 15,200 Miles, N = 900, and cr^= 41.6 Miles. 


fore is not significant, indicating that the sample may well have been a 
random sample from the known population 
Suppose that a sample mean {N = 900), possibly taken from the same 
universe, tvas 15,071 miles. This differs from the population mean by 
— 129 miles. What are the chances that a mean of a random sample 
might differ by —129 miles or more from the population mean? The 
value of dx is 41.6 miles, as before, and 

d dx 41.6 
4 990 3 

From Appendix E we find that of the curve is included between 

9 7 

15,200 and 15,071 mdles, therefore is included to the left of 15,071. 

This is Area I of Chart 131. By pure chance, sample means would fall 
below the population mean 5,000 times out of 10,000. Since there are 
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but 9.7 chances out of 10,000, or about 1 out of 1,000, that a sample mean 
might fall below the population mean by 129 miles or more, we must con- 
clude that the difference is real. 

Adding to Area I of Chart 131 the part designated as Area II, which 
shows the chances that a sample mean might exceed the population mean 

19 4 

by 129 miles or more, we have ji^q ' qqq ? c>r 2 per 1,000, as the chances that 

a sample mean {N = 900) might exceed or fall below the population mean 
by 129 miles or more. Upon either this or the preceding basis, chance is 



Chart 131, Expected Distribution of Sample Means of Tire Mileage and Chances 
of Obtaining Sample JVC eans Differmg from the Population Mean by +129 Miles and 
•“129 Miles, When Xp = 15,200 Miles, N = 900, and cr^ == Miles. 


virtually ruled out, the sample mean is significantly different from the 
population mean, and we must conclude either that the sample was actu- 
ally drawn from a different universe or that the sample was improperly 
chosen. 

The null hypothesis. We have just set up the hypothesis that our 
sample of 900, which has a mean of 15,071 miles, is a random sample 
drawn from the population having a known mean of 15,200 miles. We 
then proceeded to ascertain the probability that a difference as great as 
that between the population mean and the observed sample mean might 
occur, because of chance factors arising from random sampling. The dif- 
ference was so great that much doubt was cast upon our hypothesis and 
we abandoned it, concluding that the sample mean was significantly dif- 
ferent from the population mean. 

Such a hypothesis is called a null hypothesis, since our experiment or 
computations undertake to nullify it. The hypothesis is never proved; 
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neither is it disproved. Our inference merely casts much doubt upon it 
(thereby impugning it) or casts little doubt upon it. 

In our further study of significance of differences we shall consider the 
significance of the difference between a sample value and an assumed 
population value, and the significance of the difference between two sample 
values. The procedure for testing the significance of the difference may 
be summarized into three steps: (1) Set up the hypothesis that the true 
difference is zero (i.e., that the sample has been drawn from the known or 
assumed population or that the two samples were drawn from the same 
population). (2) Upon the basis of this hypothesis, determine the prob- 
ability that such a difference as the one observed might occur because of 
samphng variations. (3) Draw a conclusion concerning the reasonable- 
ness of the hypothesis. If such an observed difference could hardly have 
occurred by chance, we have cast much doubt upon the hypothesis of (1). 
We therefore abandon the hypothesis and conclude that the observed dif- 
ference is significant. However, if such an observed difference could very 
often occur because of chance, we have cast very little doubt upon the hy- 
pothesis. We therefore continue to regard the hypothesis as tenable and 
conclude that the difference is not significant. 

Reliability of a sample mean, when Xp and ap are unknown. The 
illustration just discussed assumed that the mean and the standard devia- 
tion of the population were known. Ordinarily these population values 
are not known. Our actual knowledge is often limited to the values com- 
puted from one (or a few) samples.^ 

It will be recalled that the standard error of a mean is computed by 
referring to the standard deviation of the universe and the number of 
items in the sample. Since we do not know the standard deviation for 
the population, we estimate it from the sample. While sample means 
vary around the population mean and may be either larger or smaller 
than the population mean, a sample standard deviation tends to be smaller 
than the standard deviation of the population.^ The standard deviation 
of the population is estimated from the sample by the expression^ 



^ Occasionally a population mean (or other measure) may be obtained by setting up 
a control population. In manufacturmg, for example, this is accomplished by produc- 
mg a large number of units under carefuDy controlled conditions. 

5 The standard deviation of the sample is computed in relation to its own mean. 
The sum of the squares of a set of deviations is a minimum when taken around 
their mean. If these deviations were computed in relation to the population mean (be 
it larger or smaller than the sample mean), the sum of their squares would be greater. 
(See L. D. H. Weld, Theory of Errors and Least Squares, p. 161, Macmillan, ISfew York, 
1916.) 

^ See Appendix B, section XII~2, for a development of this expression. 
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^rhere o’ is the standard deviation of the population as estimated from the 
sample] 

is the sum of the squared deviations of each item in the sample 
from the sample mean; 

N is the number of items in the sample. 


The expression N 1 is a particular statement of the degrees of freedom 
(n) present. When the arithmetic mean is computed, one degree of free- 
dom is lost, since the value of any one of the items is defined by knowledge 
of the value of the mean and of the remaining items. In other words, if 
for a series of items the sole requirement set up is that Htx = 0 (that is, 
the mean has been determined), the values of all of the other items save 
one may be arbitrarily set down; all but one are 'Tree to vary.^^ The 
value of the other item is determined by the above requirement. In more 
general language, "the number of degrees of freedom is the number of 
deviations [items or N] minus the number of constants determined from 
the sample and used to fix the points from which those deviations are 
measured.^^^ The use of iV — 1 in the above expression is particularly 
important when N is small. When N is large, it matters little whether 
we divide by iV or W ~ 1. 

For purposes of computation' the expression may be put m the following 
forms 

For ungrouped data: 


/SZ2 (SZ)^ 

/ZSZ2 - 

_(SZ)2 ,, 

JSZ2 -ZSZ 

iN-1 iV(iV-l) ^ 

V N(N ■ 

-1) ’ ^ 

S N-1 


For grouped data: 


5- = j 

- 1 N(N - 1)’ 


or or 

y N(N- - 1 ) 


.Jmd'r - 

y 


See L. H. C. Tippett, The Methods of Statistics, pp. 110-111, Williams and Norgate, 
London, 1937 (2nd Edition). 

8 For derivation, see Appendix B, section X1I~3. When a has been computed for a 
series of data and it is desired to transform it to a, use the expression 


since multiplying 0*2 by N gives Note that this transformation may be made 
even when we do not have either the observed values of each item or a frequency 
distribution. 
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In Table 68, data are shown of the weights at time of demobilization in 
1919 of 746 soldiers of French extraction in the United States Army. The 
computations of X, o', and dr appear below the table, and it will be noticed 
that there is but little difference between cr and 5* because N is large. The 

TABLE 68 

Weight at Demobilization of 746 French* Soldiers Serving in the United 
States Army During the World War 


Weight in pounds 

/ 

d' 

/d' 

M'y 

100 and under 110 

7 

-4 

1 ~ 28 

112 

110 and under 120 

39 

-3 

-117 

351 

120 and under 130 

123 

-2 

-246 

492 

130 and under 140 

181 

-1 

! -181 

181 

140 and under 150 

183 

0 

0 

0 

150 and under 160 

122 

■fl 

1 122 

122 

160 and under 170 

59 

+2 

1 118 

236 

170 and under 180 

19 

+3 

57 

171 

180 and under 190 

5 

+4 

20 

80 

190 and under 200 

5 

+5 

25 

125 

200 and overf 

3 

+6 

18 

108 

Total . . 

i 

746 


-212 

1,978 


* Soldiers classified as French were either (1) born in France, or (2) had parents who were both born 
in France, or (3) had three or four grandparents born in France 

t Mid-value taken as 205 pounds, giving results in agreement with those of the source 
Source. Charles B Davenport and Albert G Love, Th^ Medtcal Department of the United States Army 
%n the World War, Vol XV, Part 1, pp oO and 135 United States Government Prmtmg Office, Washington, 
1921 


1 

cr 


CF 


10 =» 142.16 pounds. 


10 


’ V W 

V 1978 /212V j 

= 16.03 pounds. 




(Zfd'Y 
N{N - 1) 


10 


V 


1978 (212)a 


745 746 • 745 


= 16.04 pounds. 


standard error of the mean is computed by using a in place of Oj., since 
the latter is unknown:^ 

® We may also obtain the standard error of the mean from 

jsi 

g Mn-1 ^ ^ 1 N a 
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For the soldiers of French extraction, 


16.04 


16.04 

27.31 


.587, 


which states that 2 times out of 3, sample means would fall within a range 
of =t.587 pounds of the population mean; that 95 times out of 100, 

sample means would fall within =±=1.174 pounds (=±=2(7^) of the population 
mean; and that 99.7 times out of 100, sample means would fall within 
=1=1.761 pounds (=^3(7^) of the population mean. 

Assuming that the sample mean is smaller than the population mean by 
3cr^, the population mean would be 143.92 pounds. Assuming that the 
sample mean is larger than the population mean by 3crx, the population 
mean would be 140.40 pounds. Since 99,7 per cent of the sample means 
vary from the population mean by not more than =^3(7^, we may conclude 
that the mean weight of all such soldiers in the United States Army is 
almost surely not less than 140.40 pounds or more tJaan 143 92 pounds. 
It should be noted that, while we are able to state the probability that 
sample means may fall within a given range around the population mean, 
it is not possible to give a statement of the probahhty that the population 
mean falls within a given range of the sample mean, since there can be no 
such thing as a distribution of population means around the sample mean. 
We can say, however, that, if many such statements as the one concerning 
Xp are made in regard to this (or some other) population, we should ex- 
pect to be correct in about 99.7 per cent of the instances. Statisticians 
make use of the concept of fiducial probability to express their confidence 
that the population mean falls within given limits. Fiducial (or fidu- 
ciary) probability states a degree of reasonable expectation and should 
not be confused with the mathematical probabilities referring xo the vari- 
ations present in statistical measures computed from samples. For the 
soldiers of French extraction, therefore, we may say that the fiducial 
probabihty is a little over 95 per cent that the population mean lies be- 
tween 140.99 and 143.33 pounds; this is the mean of the sample (142.16 
pounds) plus and minus 2o'x. Similarly, the fiducial probability is 997 
out of 1,000 that the population mean lies between 140.40 and 143.92 
pounds (X =i= 3 c7x). Fiducial probabilities, which we have just used to 
state the “fiducial limits^^ or “confidence limits^' within which a population 
mean might fall, must not be regarded as exact statements concerning the 
probability that Xp falls within given limits but rather as an expression 
of the degree of confidence which the statistician has in his conclusions. 

Significance of the difference between a sample mean and a hypothetical 
population mean. Tests were made of a sample of 50 pieces of a certain 
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type of steel -wire and revealed a mean strength of 1,221 pounds and e of 
49 28 pounds. These wires are to be used for ^ ^spinning’ ^ wire cable and 
it ]s essential that the mean value of the population from which the sample 
was taken should be at least 1,215 pounds. We wish to assure ourselves, 
then, that the sample mean is significantly greater than 1,215 pounds. 
What are the chances that such an observed mean may exceed the popula- 
tion mean by 6 pounds or more? Note that in this instance we are inter- 
ested in knowing the chances that such an observed mean may exceed the 
population mean by 6 pounds, not the chances that such an observed mean 
may differ from (either exceed or fall below) the population mean by 6 
pounds. Obviously the latter would have twice the probability of the 
former. The value of is 


cr 

VN 


49.28 ^ 49.28 
VBO 7.07 


6.97 pounds. 


This figure indicates the dispersion of sample means about the population 
mean, not the spread of sample means around the given sample mean 
The population mean is unknovm, and so we shall proceed by assuming 
it to be 1,215 pounds and shall ascertam if a variation as great as 6 pounds 
could occur in a sample by chance. This difference is .86 times the value 
of (Jx* Since the sampling distribution of means is essentially normal, we 
look up .86 in Appendix E, which gives the areas of the normal curve. 
The tabled value .3051 mdicates that there are 19 5 chances (.5000 
— .3051 = .1949) out of 100 that a sample mean may have a value of 
1,221 pounds or more, if the population mean is 1,215 pounds. The 
chances of such an occurrence being this large, we conclude that the popu- 
lation mean may quite possibly be as low as 1,215 pounds. The process 
of reasoning which we have just discussed is shown graphically in section A 
of Chart 132. 

The above does not give us reason to conclude that the difference of 6 
pounds between the sample mean and the assumed population mean is 
significant. The difference might^J^ecome significant if we enlarge the 
sample. If, now, N = 400 while X remains 1,221 pounds and & is 49.28 
pounds, as before, we have 


0- _ 49.28 

VN V4^ 


= 2.46 pounds. 


The ratio of the difference (6 pounds) to the standard error of the mean is 

— ^ = 2.44, which indicates that there are 73 chances out of 10,000 (or 
2.46 

just under 1 in 100) that a sample mean may have a value of 1,221 pounds 
or more, if the population mean is 1,215 pounds. This is shown in section 
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B of Chart 132. The chances of such an occurrence being this small, we 
conclude that there is a significant difference and that the population mean 
is unlikely to be so low as 1,215 pounds. The above two problems could 



~2Q9I ‘13.94 ASSUMED +&97 +1^94 + 2Q9I POUNDS 

TO BE 
1215 

VALUES OF SAMPLE MEANS 
A 



VALUES OF SAMPLE MEANS 
B 

Chart 132. Eiqpected Distribution of Sample Means of Tensile Strength of Steel 
Wire and Chances of Obtaining Sample Means Differing from the Population Mean 
by 4-6 Pounds: A, When iV^ 50, Z = 1221 Pounds, a = 49.28 Pounds, = 6.97 
Pounds; When N = 400, 1221 Pounds, o- = 49.28, = 2.46 Potxnds. 

have been attacked by use of fiducial probability and the conclusions would 
have been the same. 

In the illustration just discussed, we found that 73 times out of 10,000 
an event might occur owing to chance and we concluded that chance was 
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therefore ruled out as an explanation. Just how snaall should the prob- 
abilities be before we regard chance as having been virtually eliminated as 
a cause? 

Some authorities recommend that the difference between the popula- 
tion mean and the observed mean should be two times the standard error 
of the sample mean. If we are testing w^hether a value may either exceed 
or fall below the population mean, the criterion of “twice the standard 
error^^ indicates that such a difference may occur, through chance, 454 
times out of 10,000 (see Appendix E) This is 4.54 chances out of 100 
and is often referred to as the “.05 level of significance.^^ More accurately, 
the .05 level is at 1.96 cr^. It must be obvious that, if we are computing 
the probability that this observed sample mean might exceed the popula- 
tion mean by chance by an amount equal to 1.96 times the standard error 
or more, the chances are 250 out of 10,000 (2.50 out of 100). Similarly, the 
chances that the sample mean might fall below the population mean by as 
much as 1.96<7x or more are 250 out of 10,000. Since it cannot be held that 
a sample mean is significantly greater than the population mean unless the 
sample mean is also significantly different from the population mean, it 

would seem to follow that the ~ ratio which is required to establish a sig- 
nificant difference in one direction only is the same ratio that is required 
to show a difference in either direction. 

For a more rigid interpretation, others suggest that the observed differ- 
ence should be 2.58 times its standard error. If this is so, then (from 
Appendix E) the probability is 98 out of 10,000, or about 1 out of lOO, 
that the [difference (either + or — ) might have occurred by chance. 
This is referred to as the “.01 level of significance.^^ An observed sample 
mean might exceed the population mean by 2.58cr^, or more, ,49 tines out 
of 100; and voightfalllelow the population mean 2.580*^, or more, .49 times 
out of 100. 

Perhaps most satisfactory of all is to ascertain the probability that an 
observed sample mean might occur because of chance, and then to decide 
whether or not the probability is small enough for the particular problem at 
hand. If a test is run of the strength of window-sash cord (used to connect 
the upper or lower sash of a window and the sash weight), the investigator 
would doubtless be satisfied if the probabihty of obtaining a sample value 
as far above a specified minimum standard as that encountered were 5 in 
100. On the other hand, if tests are made of the strength of parachute 
cord, the probability of the observed divergence should be much less. 
The failure of window-sash cord involves inconvenience and expense; the 
failure of parachute cord means tragedy. 

Significance of the difference between two sample means. The intelli- 
gence quotient (I.Q.) ratings of a group of 68 left-handed students showed 
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a mean of 110.62. A similar group of 68 right-handed students^® had a 
mean I.Q. of 109.48. Are the left-handed students actually above the 
right-handed in respect to I.Q. rating, or is the difference so slight as per- 
haps to have been due to chance variations arising in sampling? For 
left-handed students, A'l = 68 , Xi — 110.52, o'! — 15.2. For right-handed 
students, N 2 — 68 , X 2 = 109.48, 0*2 = 15.5. 

One way to attack this problem would be to compute for the left- 

handed group and to ascertain, by fiducial probabilities, how much lower 
than 110.52 the population mean might be (for example, 110.52 — 


Then compute for the right-handed group and determine, by fiducial 

■*2 

probabilities, how much above 109.48 the population mean for these stu- 
dents might be (for example, 109.48 + 3 cr= ). If the first value still ex- 


ceeds the second, we may be reasonably certain that a real difference 
exists. But, suppose the two values just meet, or overlap slightly. Diffi- 
culties of interpretation immediately arise. 

A much better procedure consists of determining the value of the stand- 
ard error of the diSerence between the two sample means and comparing 
this with the observed difference between the sample means. The stand- 
ard error of the difference between the two means _ 5 : is given by 

JLl Aj 


■X, 


For the left-handed students: 


= + cr^“. 


_ == 15.2 

^ 8.246 

For the right-handed students: 

_ 15.5 _ 15.5 
"^^2 ~ yM 8.246 


1.84. 


1 . 88 . 


Based on data from Ralph Haefner, The Educational Significance of Left-Handed- 
nessj p. 28, Teachers College, Columbia University, New York, 1929. 

See Appendix B, section Xll-4, where it is shown that (t^ ~ it — 4-cr| 

-^2 ^ ' -^^2 

provided there is no correlation between paired sample means. This will tend to be 
the situation when there is no inherent pairing between the items of the two series 
being compared. 


If pairing exists between the items of the two samples being compared and there is 
correlation between these paired items, it may or may not be true that correlation 
exists between paired means of many such pairs of samples. In such a case we may 
compute the difference between each pair of items in the two samples, and ascertain 
from these differences their mean An, and their standard deviation ctj) or The 

ffjQ 

value of js then determined, and Xd is compared with (Xx^ to ascertain if 

Xjo differs significantly from zero. Alternately, we may apply the test described in 
the text above If either of these tests discredits the null hypothesis, we should not 
ignore its testimony. 
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The standard error of the difference is: 


= V(i.84)" + (1.88)^ = ^3 386 + 3.534, 
= V^O = 2.63. 


Having now the Jwo values: (1) the observed difference between the 
two means Xi — Z 2 = 110.52 — 109.48 = 1.04, and (2) the standard 
error of the difference between the two means = 2.63, we are in a position 
to answer the following question: 

If the true difference between the means is zero, what is the probability that 
Xi might exceed X 2 by 1.04 or more because of chance variations f 

The ratio of the observed difference to the standard error of the differ- 
ence, 


X _ Xx- X2 


1.04 

2.63 


.395, 


indicates the point on the curve beyond which all X values are as great 
or greater than the observed difference (see Chart 133). If we look up 



-7.89 -5.26 -a.63 0 +2.63 + 5.26 +7.69 


VALUES OF 5^- SCa 

Chart 133. Expected Distribution of Differences Between Sample Means of Intelli- 
gence Quotients of Left-Handed and Right-Handed Students and Chances of Obtaining 
a Difference of +1.04, When ~ 

.395 in the table of areas of the normal curve (Appendix E) and subtract 
the result from .5000, we shall know the fraction of the area of the curve 
which is cross-hatched in Chart 133 and we shall have the answer to our 

question. Looking up - = .395 in Appendix E, we find .1536. This means 

that 15 out of 100 sample differences would fall between Zi — X 2 “ 0 
and Xi — X 2 = +-1.04, and consequently 50 — 15. or 35 out of 100, 



320 


RELIABILITY AND SIGNIFICANCE 


[Chap. 12 


would occur beyond +1.04. If the I.Q.^s of left-handed and right-handed 
students are actually identical, we should expect sample means of left- 
handed students to exceed those for right-handed students 50 times out 
of 100. It appears, then, that if the LQ.'s of left-handed and right-handed 
students are actually identical, we might get differences between sample 
means as great as this and in favor of left-handed students 36 times out 


TABLE 69 

Weight at Demobilization of 1,821 Scotch* Soldiers Serving in the United 
States Army During the World War 


Weight in pounds 

f 

d' 

fd' 

Mr 

100 and under 110 

12 

-4 

- 48 

192 

110 and under 120 

79 

-3 

-237 

711 

120 and under 130 

254 

-2 

-508 

1,016 

130 and under 140 

436 

-1 

-436 

436 

140 and under 150 

404 

0 

0 

0 

150 and under 160 

308 

+1 

308 

308 

160 and under 170 

175 

+2 

350 

700 

170 and under 180 

89 

+3 

267 

801 

180 and under 190 

37 

+4 

148 

592 

190 and under 200 

19 

+5 

95 

475 

200 and over j 

8 

+6 

48 

288 

Total . 

1,821 


- 13 

5,519 


* Soldiers classified as Scotch were either (1) born in Scotland, or (2) had parents who were both bom 
in Scotland, or (3) had three or four grandparents born in Scotland 

t Mid-value taken as 205 pounds, gmng results in agreement with those of the source 

Charles B Davenport oud Albeit G Lov<^ Thp Medical Devarfm.pvf or the United States Army 
.r ... .,r • ]Var, Vol XV, Parti, PI) oO find 135, r i — : ■*? - » OfiSce, Washington, 

1921. 


Z = 145 

<r = loi 


(iMi) 10 = 144.93 pounds. 


= 10 


15519 

/ 13 Y _ 

fl821 

\1S21 / 

15519 _ 

(13)=^ 

[1820 

1821 • 1820 


17.41 pounds. 

= 17.41 pounds. 


of 100, owing to chance factors. This difference may have been due to 
chance and there is thus no definite evidence (from these data) that the 
LQ. of the left-handed students is superior to that of right-handed students. 

It will be recalled that, for the 746 soldiers of French extraction, X 
~ 142.16 pounds and d* = 16.04 pounds. Table 69 shows a distribution 
of the weights of a group of 1,821 United States soldiers of Scotch extrac- 
tion. The measurements were taken at demobilization in 1919, as were 
the others. From this table it is computed that X = 144.93 pounds and 
d" “ 17.41 pounds. Is there a real difference in favor of the soldiers of 
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Scotch extraction, or is the difference so slight that it may be attributable 
to chance? If the difference is significant, we may regard the two groups 
as clearly different; if it is not significant, we conclude that they may have 
been drawn from the same population We shall proceed exactly as in 
the preceding example, computing for each series, then cr^ _ ^ , and 

finally comparing the observed difference of the means of the two series 
with the standard error of the difference. Summarizing: 


Soldters of Scotch extraction 
Ni = 1,821 
][i ~ 144 93 pounds 
O’! = 17 41 pounds 
17 41 

cr^ ~ ■ = .408 pounds 

VI, 821 


Soldiers of French extraction 
N2 ^ 746 

X 2 = 142 16 pounds 
5*2 =16.04 pounds 


16 04 

v'm 


.587 pounds 


Xi - X 2 = 144.93 - 142 16 = 2 77 pounds 
-X 2 = v^( 587)2 + ( 408)2 = 715 pounds 



Chart 134, Expected Distribution of Differences Between Sample Means of Weight 
of American Soldiers of Scotch and French Extraction and Chances of Obtaining a 
Difference of +2.77 Pounds, When crx^-x^ ~ Pounds. 

The soldiers of Scotch extraction were 2.77 pounds heavier, on the aver- 
age, than were those of French extraction. If the true difference between 
the mean. weights of these two groups is zero, what are the chances that 
an observed difference of 2.77 in favor of the Scotch might occur, due to 
variations arising from sampling? The ratio of the observed difference to 
the standard error of the difference is 

^ ^ _ 2.77 _ 2 g'j 

0^ cf ^ ^ .715 

^ ® 49 994 

From the table of areas of the normal curve, we find that of a 
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normal curve is between ~ = 0 and ~ = 3.87. Beyond the — — 3.87 point, 

(T O’ u 

then, there are 50,000 - 49,994 or 6 out of 100,000 occurrences. This is 
shown in Chart 134 It is quite apparent that a difference of 2.77 pounds 
in favor of the Scotch soldiers could hardly occur fortuitously and there is 
thus a clearly significant difference between the two groups. 

Procedure when Ni ^ N 2 * T\'hen testing the hypothesis that both 
samples have been drawn from the same population (i.e., that the true 
difference between the means is zero), we have used the formula 


- ^2 








=V: 


^1 J, 

Ni ^ N2 


It is, however, more accurate to utilize aU available information to make 
one estimate of the variance of the parent population from the variances 
of the two samples taken together, and to substitute that in place of the 
two separate estimates in the above formula. The formula then becomes 




4 


_ 2 ". 2 

'^1 4- 2 


Ni 


Ni 


in which + 2 is the average of the two estimates of population variance 
computed by the expression 

Zx! + 

(Ni - 1) + (Ni - 1)' 

When Ni = N 2 y the results are identical regardless of whether two sep 
arate estimates or one combined estimate of variance is used; but when 
N I ^ N 2 j in order to obtain maximum accuracy Oi and ot must be weighted 
by their respective degrees of freedom when we are averaging them to 
obtain the final estimate of The result is the more complicated 
expression' 


12 


” ^2 


4 


(Nt + N2)&xI + Xxi) 
NiN2[iNi » 1) + (W 2 - 1)]‘ 


While this expression introduces an element of increased rigor into the 
method, it, is not apt to alter the conclusion appreciably when the are 
large, but may be important when the iV^s are small and unequal. Very 
rarely do we have to compare series with N^s as greatly different as for 


the soldiers of French and Scotch extraction. The value of Sxf (for the 
Scotch) may be obtained from 




^ 


See Appendix B, section XII-5. 
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or, since we are dealing with a frequency distribution, from 

Sx? = [s/(dO^ - 

and similarly for 'Zxl for the French. [The value of Sx? may also be 
obtained from iViorf or (iV'i — l)o'i, and similarly for Referring to 

Table 69, we find the values for the Scotch group are: 

24 -100 [6,519 -2|i] 

= 100 (5,519 - .09281) 

= 551,890.72. 

Taking the necessary totals from Table 68, we find the values for the 
French group are: 

•Sxl = 100 [l,978 - 

= 100 (1,978 - 60.24665) 

= 191,775.34. 


The value of o''- ~ may now be determined: 

Xi Xg 




4 
-4 
-4 


(1,821 + 746) (551,890.72 -f 191,775.34) 
(1,821)(746) (1,820 +- 745) 


(2,567) (743,666.06) 
(1,358,466) (2,565) 


1,908,990,776.02 
3.484,465,290 

V. 547858 


= .740. 


Comparing the observed difference between the means to this value gives 
the ratio _ _ 

- = ^ = 3.74. 

(7 cry - r .740 

49 991 

Referring to the table of areas of the normal curve, it appears that ^qq qqq 

30 2 !/ 

of the area of the curve occurs between - — 0 and - = +3.74. Beyond 

(X <T 

X 9 

~ = +3.74 therefore, we should find ^ of the area. Since there is 
cr ^ ’ 100,000 

about one chance out of 10^000 that the observed difference in favor of the 



324 


EELIABILITY AND SIGNIFICANCE 


[Chap. 12 


Scotch might occur fortuitously, it follows that the difference is clearly 
significant. It may be observed that the probability obtained by this 
method and the probability obtained by the preceding method are in 
rather close agreement even though the N^s are respectively 746 and 1,821. 

Reliability of the mean of a stratified sample. What has been said up 
to this point has dealt with the reliability of means based on random 
samples. Sometimes, however, we can obtain even more reliable results 
by using a stratified sample. Thus, when the population can be divided 
into pertinent categories or strata^ we may select samples at random from 
each of these strata This procedure is especially apropos when the popu- 
lation is heterogeneous and is composed of a number of strata each rela- 
tively homogeneous. A stratified sample is usually so selected that the 
number included m the sample from each stratum is proportional to the 
numerical importance of that stratum in the population. 

When information concerning the population is available, the population 
variance, used in computing the standard error of the mean, is based on 
the deviations of the items within each stratum from the mean of that 
stratum, rather than from the mean of the entire population. Referring 
to this as (T^f to distinguish it from computed in relation to the mean ot 
the entire population, we have 

fn Ps 2 

22(X-Zs) 

= - — — 

1 _ 

where Ps is the number of items in a stratum, Xs is the mean of a stratum, 

m 

2 indicates a summation from item a to item in a stratum, and 2 ii^di- 

a 1 

cates a summation from stratum 1 to stratum m. 

— 2 

Now if (X — Xs) is summed for a particular stratum the value obtainec 
is Ps<^l, where cr| is the variance of the stratum. Also, XPs == P, the 
number of items in the population, and we may write 


2Ps(ri 


Therefore 


4 


X of a stratified sample 


0-|, 


(T pt 

T 


2 Ps<r| 

1 1. w 

P • J 


where N is the number of items in the entire sample. 
Section XII“6, it is shown that 


'X 01 &, stratified sample 


N 


^ Of strata means 


iV 


In Appendix B, 
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fT2 

^ of strata means 


m --2 

S Ps(J^S — ^p) 


Either of these expressions is satisfactory (since they are equivalents) 
when population data are available. Howe\rer, when we have to base out 
computations on sample data we refer to the procedure used for two samples 
on page 322 and merely extend the expression to cover several strata as 
if they were several samples Using the symbol to indicate the 

estimate of based on several samples, we have 

~2 _ So?! + So?! + » « - + IjXm 

(i\ri - 1) + (i\r2 - 1) + • - • +(iv. - 1) 

+ ^ + * • • + 

and the standard error of the mean of a stratified sample becomes 

(T — 

Of a stratified sample ^ 

if the population is large in relation to the sample. 


Reliability of Sample Means, Small Samples 


The t distribution. ;^eceding discussion it was assumed that the 

sampling distribution of is normal, which is essentially true when 

_ __ 
Y" ^ V 

N is large but not when N is small. Similarly, the distribution of — 

is effectively normal when N is large but not when N is small. In each 
instance the lack of normality arises out of the fact that, in computing 
the value of (r^, we use a (the estimate of the standard deviation of the 
population) in place of the true value of the standard deviation of the 
population (Tp.^^ When N is small, the sampling distribution follows the 
t curve, which is more widely dispersed than the normal curve (see Chart 
135) and becomes more so in inverse relation to the degrees of freedom 
present. As may be seen from the chart, there is a greater proportion of 
the t curve beyond any given deviation from the mean than there is 
in the case of the normal curve. A table (called the t table) has been 
devised, which gives for various degrees of freedom (n) the probability 
that observed values differing from zero (either positively or negatively) 
may occur owing to chance. If a quantity is distributed normally in tenna 


For a more detailed discussion, see R. A. Fisher, Statistical Methods for Research 
Worker Sf pp. 124-125, and Boyd, Edinburgh, 1938 (7th Edition). 
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Chart 135. Corapanson of the Normal Distribution and the t Distribution when 
n — 20, n = 5, and n == 2. The ordinates of the t distribution are obtained by the 

5(^> : 

expression Yc = 'y- ^ This gives a maximum ordinate of 

13! 

1 0000, comparable to the expression Yc == for the normal curve The symbol P 
in the expression for the t distribution is the same as ” in the expression for the normal 


curve The computation of 


M' 


may be clarified by an illustration. If n ~ 11, 


the numerator is 5^, while the denominator is 4.5!. The value of 4.51 is given by 4.5 
X 3 5 X 2.5 X 1 5 X .5 X V7. 
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of its acttial standard error around a mean of zero, t is the ratio of that 
quantity to an estimate of its standard error. Thus, while 


X-Xp 



is distributed normally, 

Vn 

is distributed according to the t distribution. Since a differs little from cFp 
when N is large, this is an important consideration only when N is small. 

An examination of this table (shown in Appendix F) or of Chart 135 
will reveal the wider range of the t distribution in comparison with the 
normal distribution. In a normal distribution, 99 per cent of a series of 
sample means would be within =fc=2.6(r- of the population mean; in a i 
distribution, with n = 10 (N = 11), 99 per cent would be within ±3.20o-j^, 
a somewhat wider range. This table should be used whenever the value 
of N is small; that is to say, when the degrees of freedom are 30 or fewer, 
although the difference between this table and the table of areas of the 
normal curve does not become appreciable until N becomes less than 20. 
This may be seen by referring to the t table in Appendix F. The last lire 
of the t table, which shows the values of t when n = oo , gives the same 

X 

ratios as are to be had for - from the table of areas of the normal curve. 

a 

Reliability of a sample mean when N is small. Tests have been made 
of the breaking strength of ten pieces of hard-drawn copper wire as shown 
in Table 70. The mean of this sample is 575.2 pounds, and it is desired 
to ascertain the reliability of this sample mean. As shown below the 
table, or = 8.70 pounds and (r~ = 2.75 pounds. The value of o* is 8.26 
pounds, and it is to be noted that this value is appreciably smaller than & 
because N is small. 

In stud3dng the reliability of the mean from this small sample, let us 
first assume that we know the mean of an entire population and test the 
divergence of our sample mean from this population mean. Suppose that 
the population mean is known to be 577,0. Is the value of the sample 
mean (575.2) sufficiently different to invalidate the hypothesis that the 
sample was drawn from this population? The deviation of the sample 
mean is —1.8 pounds, which is .65 times the standard error of the mean. 
By referring to the t table of Appendix F, for = 9 (ti = i\r — 1 since 1 
degree of freedom was lost when X was computed) and t = .65, we ffiod, 
by interpolating, that there are about 53 chances out of 100 that a mean 
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from such a sample may differ ±18 pounds or more from the population 
mean. There are about 26, chances out of 100 that such a sample mear. 
might fall helow the population mean by 1.8 pounds or more. There is, 

TABLE 70 

Breaking Strength op 10 Specimens of .104-Inch 
Diameter Hard-drawn Copper Wire 


Specimen 

Breaking strength 
in pounds 

X 

X2 

1 

578 

334,084 

2 

572 

327,184 

3 

570 

324,900 

4 

568 

322,624 

5 

572 

327,184 

6 

570 

324,900 

7 

570 

324,900 

8 

572 

327,184 

9 

596 

355,216 

10 

584 

341,056 

Total 

5,752 

3,309,232 


Source. American Society for Testing Materials, Supplements to 19S$ 
A S T M Manual on Presentation of Data^ “Supplement A — Presenting 
Plus and ‘‘V n‘-' ty of an p 1, 

reprinted . I • (. ■ \ nencan & <»■. , K-:’’., Ma- 
terials, V( .t*. ■ ‘ 1935 


X = =« 575 2 pounds. 




/S, 309,232 

/5752Y 

f 10 

\ 10 j 

/3, 309, 232 

(5752)2 

i 9 

10 • 9 


== 8.70 pounds. 


8.70 

VTo 


2.75 pounds. 


consequently, no clear evidence of a significant divergence of the sample 
mean from the population mean, and the sample may well have been 
drawn from this population. 

Now let us assume, as is more frequently the case, that we do not know 
the value of the population mean but wish to form some idea of the range 
within which its value may occur. Our computed values are, as before, 
X = 575.2 pounds and cr^ “ 2.76 pounds; also, iV — 10 and n - 9. If 
we were dealing with a large sample, the sample means would vary =t=2cr^ 
or more from the population mean in about 5 samples out of 100 (see area 
table. Appendix E). However, we are considering a small sample with 9 
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degrees of freedom. Referring to the t table (Appendix F), opposite = 9 
we observe that the sample means would vary =±=2.3cr~ or more from the 
population mean in about 5 samples out of 100 Assuming that the 
level of significance^^ is satisfactory as a criterion, we find that, if the ob- 
served mean is below the population mean, the population mean might be 
575.2 + (2.3) (2.75) = 581 5 pounds; while, if the observed mean is above 
the population mean, the population mean might be 575.2 ~ (2.3)(2.75) 
= 568.9 pounds. We conclude, then, that it is likely that the population 
mean falls between 568.9 pounds and 581.5 pounds, since the fiducial prob- 
ability is 95 out of 100 that the population mean falls between these limits. 
If a more strict criterion is desired, we may consider the range which wiU 
include (say) 98 per cent of the possible sample values. Entering the t 
table at n = 9, we find that 98 per cent of the sample means would vary 
within =±=2.8a-^ (=^2.8 X 2.75 = =7.7 pounds) of the population mean. 
It is even more likely, then, that the population mean is between 567.5 
and 582.9 pounds, as the fiducial probability is 98 out of 100 that the 
population mean occurs within this range 

TABLE 71 

Strength op Lead in Two Number 2 Pencils Manufactured by ^‘Company 



Pencil (a) 



Pencil Qd) 


Test 

Strength m 
kilograms 
Xi 

X\ 

Test 

Strength in 
kilograms 
Aa 

XI 

1 

1.62 

2.6244 

1 

1.78 

3.1684 

2 

1.74 

3 0276 

2 

1.48 

2 1904 

3 

1.68 

2.8224 

3 

1.72 

2.9584 

4 

1.50 

2.2500 

4 

1.62 

2.6244 

TJptal 

6.54 

10 7244 

Total 

6.60 

10 9416 

Source: From tests conducted 

in 1934 by the Eagle Pencil Company 



= 

“ 1.635 kilograms 

X2 - 

6.60 

' “ 1.650 kilograms. 

5-1 = 

/l0 7244 (6.54)2 

^3 4*3 

5-2 « 

. y 10 9416 

(6.60)2 

4 -3 

= 

1015 kilograms. 



* .1311 kilograms. 

= 

.1015 , 

= .0507 kilograms. 


.1311 

* = .0656 kilograms. 


Significance of the difference between two means when N is small. 
Table 71 shows data of the strength of the points of two Number 2 pencils 
manufactured by a company designated as ^^Company X/et us ascer- 
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tain if the difference in the mean strength of pencils (a) and made by 
Company E is significant. From Table 71 it is computed that Xi = 1.635 
and X 2 = 1.650, while cr|^ = .0026 and cr|^ = .0043 kilograms. The 

standard error of the difference between the means is 


~ Xg '\/B026 + .0043 = .0831 kilograms, 

and _ _ 

Zi - Z 2 _ 1.635 - 1.650 _ __ 

^ - .0831 .0831 

This is the t of the t table (Appendix F) which we consider in conjunction 
with n, the degrees of freedom present. In this instance, since the mean 
was computed for each series, there are three degrees of freedom present 
in each series: ni = 3, ^2 = 3; ?^ = ni + n 2 = 6. When = 6, ^ = .131 
for P = .9, and t = .265 for P = .8. Therefore, when t = .18, P is about 
.86, indicating that a difference of =*=.015 kilograms might occur about 86 
times in 100. There is thus no evidence of a significant difference in 
strength between the points of these two pencils. 

Significance of the difference between two means when Ni 9 ^ N 2 and 
when both iV’s are small. From archaeological excavations conducted at 
a certain site, 16 lower first molars were recovered. These showed a 
mean length Xi of 13.57 millimeters and ai of .72 millimeters. From a 
nearby site, 9 lower first molars were taken with X 2 = 13.06 and era == .62 
millimeters. Is there a significant difference in the mean length of these 
two groups of lower first molars? When Ni 9 ^ W 2 , we pool the variances^ ^ 
of the two series by use of the expression shown on page 322. 




4 


t = 


{Ni + N2)(Lx\ + 'Lxl) 

N1N2W1 - 1 ) + {N2 - 1 )] 

1(16 + 9) (8.2944 + 3.4596) _ 
M 16 • 9 (16 + 8) 

h - Xz .51 


.298. 


.298 


= 1.71. 


The first set of data contributes 15 degrees of freedom, while the second 
contributes 8, and ?^ = 15 + 8 = 23. Referring to the t table, for t — 1.71 

Based upon illustrative figures used in a lecture by Professor Egon Pearson at 
Columbia University m 1931. 

We obtam So;! from ctx as follows: 

cr, = S;c5 = m<rl 

j Ni Ni 

Therefore '^xl = 16(.72)3 « 8.2944. Similarly, for the second set of data, ** 
9(.62)2 = 3.4596. 



Chap. 12] 


ARITHMETIC MEANS 


331 


and n = 23, we find that there is about 1 chance in 10 that such an ob- 
served difference (=±=.51 millimeters) might occur by chance and we 
conclude that there is no clear evidence that the difference is significant. 
If, instead of using the t table, we had referred to the table of areas of 

the normal curve, we would have foimd that for - = 1.71 the chances of 

(T 

obtaining a difference of =±= .51 were 872 out of 10,000, or 8.7 out of lOO. 
While the difference in the P values is not great when n = 23, it will be 
found to be progressively greater as n becomes smaller. 

In the preceding paragraphs we have considered the significance of dif- 
ferences between single pairs of means taken at random. If instead of a 
single pair of means obtained from two samples we have a hundred sample 
differences the true value of which is zero, we would expect about five of 
the differences to be beyond the range of twice the standard error; one 
would be expected to be beyond the range of 2.6 times the standard error. 
Such a great difference might erroneously be reported as significant. It 
IS not within the scope of this text to discuss this problem. Tippett^ ^ 
suggests a procedure whereby in testing the significance of the greatest 
difference between two samples out of a group of samples the usual pro- 
cedure is slightly modified. The method of analysis of variance, discussed 
in the following chapter, may also be applied. 
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CHAPTER XIII 

RELIABILITY AND SIGNIFICANCE OF 
STATISTICAL MEASURES 

PERCENTAGES, STANDARD DEVIATIONS, VARIANCES, 
AND THE CRITERION OF LIKELIHOOD 


Reliability of Sample Percentages 

Standard error of a sample percentage. If a percentage (or a propor- 
tion) has been computed from a sample drawn at random from a larger 
population, it is subject to sampling variations, as is any other statistical 
measure. The standard error of a percentage cTp is given by the expression 



where p is the proportion in the population expressed as a decimal, 
? = 1 — p, and N is the number of items in the sample. 

Reliability of a sample percentage. If the proportions of white and 
black marbles in a large assortment are equal, we have p (proportion 
white) = .50 and q (proportion black) = .50. Assuming a normal distri- 
bution of sample proportions about the proportion in the population we 
find that if random samples of 100 marbles are drawn we should expect 
about 68 out of 100 of such samples to show proportions of white marbles 
varying within =^ 0-5 of .50. Since ’ 


<r 



(•50) (-50) 
100 



= V-0025 = .05, 

the expected range is from 45 to 55 per cent. Likewise we should expect 
about 95 out of 100 samples to show from 40 to 60 per cent of white 
marbles, and about 99 out of 100 to show from 35 to 65 per cent of white 
marbles. 

In a group of 40 first cousins there were found to be 22 males and 18 
females. Let us ascertain if the observed proportions are inconsistent 
with the hypothesis that the sexes should be in equal proportions. Letting 
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2? represent the proportion of males and g the proportion of females in the 
population, we have: 

^ / (.5Q) (.50) ^ /;§ 

^ ^ 40 V 40 

= V.00625 = .079. 

The observed proportion of males ps was ff, or 55 per cent, while the 
proportion in the population was 50 per cent. This gives 

^ =: Ps — P _ .55 .50 _ .05 _ 

O’ o-p ,079 .079 ~ 

Looking up this ratio in the table of areas of the normal curve shows that 
a divergence of 5 per cent or more in favor of males might occur in about 
26 samples in 100, and that a divergence of 5 per cent or more in either 
direction might be expected to occur in about 52 samples in lOO. We 
therefore conclude that the observed proportions do not show a significant 
divergence from the population values. 

In October 1911 the Chicago, Milwaukee, St. Paul & Pacific Railway 
installed a number of ties near Hartford, Wisconsin, for the purpose of 
testing various kinds of wood and various materials and processes for pre- 
serving the wood. One lot of 50 red oak ties was preserved by means of 
creosote applied by the “full celF^ process. In June 1934, after about 
22f years of service, 22 ties, or 44 per cent, were still in good condition.^ 
How far may this value be from the population percentage? Since the 
values of p and g in the population are not known, we obtain a rough 
approximation by substituting the proportions found in the sample. Thus 

/(.44)(.56) / 24^ n 

(Tp = ~ ^ = .070, or / .O per cent. 

The chances are therefore about 95 out of 100 that the observed percentage 
is within =t= 1.96 X .07 (or 13.7 per cent) of the population percentage. Fur- 
ther, the chances are 99 out of 100 that the observed percentage lies within 
the range of =i=2,5So’p (or 18.1 per cent) of the population percentage. The 
reliability of this percentage is thus quite low. In terms of fiducial proba- 
bilities we might say that there is a 95 per cent fiducial probability that the 
population percentage lies between about 30 and 58 per cent, and a 99 per 
cent fiducial probability that the population percentage lies between about 
26 and 62 per cent. A somewhat more satisfactory determination of the 
fiducial limits is given on pages 335-337. 

Reliability of percentages and the f®st* Let us consider another pro- 
cedure for evaluating the reliability of ratios. In the group of 40 first 

^ The data are from Proceedings of the American Wood Preservers Association^ 1935t 
pp. 133-134. 
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cousins mentioned before, there were found to be 22 males and 18 females. 
Again we shall ascertain if this distribution is inconsistent with the hy- 
pothesis that the sex ratio is 1 : 1. We shall proceed to make the test 
exactly as was done for testing the fit of a normal curve in an earlier 
chapter. The results are as follows: 


Sex 

Observed 

/ 

Expected, 
ratio 1 . 1 
fc 

f~fc 


C/-/cP 

fc 

Male 

22 

20 

+2 

4 

.20 

Female . * . . 

18 

20 

-2 

4 

.20 

Total . , 

40 

40 



40 


The value of x^ is -40 and there are two categories, male and female. 
The two sets of data were brought into agreement in respect to totals 
and thus one degree of freedom was lost. For n = I and x^ ^ -40, we 
find P (see Appendix I) is slightly more than .50. The proportion of the 
distribution of x^ beyond x^ — *4:0, when = 1, may be visualized by 
referring to the chart accompanying Appendix I. The test indicates that 
the difference from expectation could well have arisen from chance and 
therefore does not indicate a significant digression from the expected. This 
conclusion is the same as that arrived at earlier.^ 

The value of x^ may also be obtained from the expression^ 



where: a — the number of times the first factor occurred; 

6 = the number of times the second factor occurred; 
p = the expected or hypothetical probability associated with the 
first factor; 

q == the expected or hypothetical probability associated with the 
second factor; 
jV - a + &. 

^ Similar results will also be obtained by using the standard error of the number of 

occurrences, Ua = computing (where p is the proportion in the popula- 

tion and a is the actual number of occurrences in the sample corresponding to p), and 
referring to the table of areas of the normal curve. 

^ This expression may also be written 

^ " Npq ' 

See Appendix B, section XIII-1, for a development of these two expressions from the 
usual expression for 
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For the data of forty first cousins, 



the same as was found before. 

Upon the basis of past experience the fatality rate from typhoid fevei 
for a certain community was found to be 14.2 per cent (that is, reported 
deaths from typhoid fever reported cases of typhoid fever = .1420). 
A survey was made of certain congested areas. The homes studied were 
selected as nearly as possible at random and a fatality ratio of 30.0 (36 
deaths) was found for 120 cases of typhoid fever. Does this represent a 
significant departure from the population value 14.2? We shall compute 
and determine the probability that such a value of x^ might arise by 
chance. 



As before, there are two categories (that is, patients may survive or die), 
one degree of freedom has been lost since the observed and expected dare, 
have been brought into agreement with respect to totals, and ?^ = 1. 
Such a value of is far beyond the .001 level and we conclude that the 
difference is clearly significant. 

We may also use the x^ expression just given to ascertain the fiducial 
limits of p from sample values of a, 6, and N. Considering the data of 
50 red oak ties creosoted by the “full cell^' process which were given previ- 
ously in this chapter, it was found that after a certain period of service 
22 (or 44 per cent) of the ties were still in good condition. Thus a = 22, 
fe =5= 28, iV == 50. Let us first determine the 95 per cent fiducial limits for p. 
Referring to the first use of x^ in this chapter (for the determination of x^ 
for the sex distribution of first cousins), it will be apparent that whether 
the observed frequencies exceed or fall below the expected frequencies does 
not matter; either divergence makes x^ large and P small. Thus the 95 
per cent fiducial level is given by taking the value of x^ at P = .05. 
(When n = 1 and P = .05, x^ = 3.841.) The computations are 
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3.841 = 


(22 “^ 28 ^ 


192.050 5 = 484 - 1232 ^ + 784 

q q 


484 - 1424.050^ + 784^^' = 0. 

a quadratic of the form a + bX + cX^ = 0, when a, h, and c are 
given, v-e may solve for X by use of the expression 

„ -6 ± - 4ac 

^ " 2c 


p 1424.050 ± V(1424.050)2 - 4(784) (484) 
a ~ 2(784) 

1424.050 ± 714.029 
“ ■ 1568 

2138.079 _ , 710.021 
~ 1568 1568 ■ 

For the first of these fractions, 

p _ 2138.079 
q 1568 ’ 

p _ 2138.079 

p + q 2138.079 + 1568’ 

Y “ .577 (or 57.7 per cent). 

Solving the second in similar fashion, 

p = .312 (or 31.2 per cent). 

Prom the above we conclude that there is a 95 per cent fiducial proba- 
bility that the population value of p (proportion of ties surviving) lies be- 
tween 31.2 and 57,7 per cent. 

If we desire to ascertain the 99 per cent fiducial limits of p, we use 
= 6.635, which gives 

(22 - £ 28Y 
6.635 ='- ^ 2 — L 
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784 - 1563.750 ^ + 484 = 0 

V?/ Q 

p_^ 1563.750 =fa 963.063 
q 1568 

p = .617 and .277. 

These results tell us that there is a 99 per cent fiducial probability that 
the population value of p falls between 27.7 and 61.7 per cent. These are 
non»equidistant limits* about the sample percentage in contrast to the 
rougher equidistant limits on page 333. 

Significance of difference between percentages. At the same time the 
50 red oak ties preserved with creosote by the ^Tull celF^ process were 
laid, another group of 50 red oak ties was installed The second lot, how- 
ever, was creosote-impregnated by the “Rueping^^ process. Of this lot, 
18 ties, or 36 per cent, were still in seivice in June 1934. Assuming that 
both lots were subjected to identical conditions otherwise (and this ap- 
pears to be true), is the difference between the percentages significant? 
For the “full cell” processed ties, pi was .44, or 44 per cent, and ap^ was 
found to be .070. For the “Rueping” processed ties, 

= .068, or 6.8 per cent. 

The standard error of the difference between the two percentages is 

— P2 ~ ^^2 

- V.0702 + 0682 
= .098, or 9.8 per cent. 

The observed difference between the two percentages is 44 36 = 8. 

5 - pa = A 82 

<r cTp^ ^ 9.8 

This ratio is so low that it is clear that the advantage of 8 per cent in 
favor of the “full cell” process may have been due to chance. 

During a three-year period, experiments with two types of lighting sys- 
tems were conducted in an elementary school. Room 1 had two 150-watt 
manually controlled lights, which were turned on and off by teacher ox 
pupils as needed. Room 2 had four 300-watt indirect lights, controlled 
by automatic relays which turned the lights on when additional illumina- 
tion was needed and off when no longer required. The pupils were sixth 
grade students. All were given the “Otis Self- Administered Test of Mental 
Ability” and the “Standard Achievement Test/^ They were divided 

A more accurate method is given by Clopper and Pearson in Biomeirica, Voi. XZSVI, 
pn. 404r^l» 
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equally between Eoom 1 and Room 2 according to the results of these 
tests. The two classrooms were identical second floor rooms with north 
light. Two teachers employing the same teaching methods were assigned 
to departmental work with the two classes, each teacher teaching certain 
subjects in both sections. During the three-year period there were 115 
pupils in Room 1 and 112 pupils in Room 2. In Room 1 there were 29 
failures; in Room 2 there were 9 failures. The percentages failing were 
25.2 per cent for Room 1, and 8.0 per cent for Room 2.^ Is there a sig-- 
nificant difference between the two percentages? 

For Room 1 : c^p^ = = .0405, or 4.05 per cent. 

For Room 2: ' ” -0256, or 2.56 per cent. 

The standard error of the difference is 


o'p, - P, = \/.04052 + .02502 = .0479, or 4.79 per cent. 


Failures in Room 1 exceeded those in Room 2 by 25.2 — 8.0 = 17.2 per 
cent, and 


X 

a 


_ Pi - V2 _ _ 


4.79 


= 3.59. 


From the table of areas of the normal curve (Appendix E), it appears 
that, if the true difference between pi and p 2 is zero, a difference of 17.2 
or more in favor of pi might occur about 1.6 times in 10,000. There ap- 
pear to have been significantly fewer failures in Room 2, the difference 
presumably being attributable to better and more adaptable lighting. 

When iVi 9 ^ N 2 , instead of using two estimates of the p value for the 
population, we may combine the information available from the samples 
and make one estimate. This is a reasonable procedure since we set up 
the hypothesis that both samples are from the same universe. Denoting 
the combined estimate by ^ 1 + 2 , 


and 


Pl+2 = 


Nipi 4 “ N2P2 

N 1 +N 2 ' 




Pi 




Pl^2 ^1+2 , Pl+2 ^1+2 

^ N 2 ' 


^ The data are from F. C. Albert, “Scholarship Improved by light,” Transactions 
tM Illuminating Engineers Society. December 1933, pp 866-872. 
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which is equivalent to 

O’' - = ■aI'd a -Vi ~l~ N2 

*>2 ^ Pl +2 ? l +2 

The procedure, from this point on, is the same as before. 


Reliability of Measures of Dispersion 

Reliability of a sample cr. We may test the significance of a sample 
<r in a manner similar to that previously described for X. The standard 
error of the standard deviation cr^ is given by 


If kurtosis is present. 


<T<r 


V2N 


(Ta = 


V2N 


+ 


^2 


where ^2 is the kurtosis in the population 
The standard deviation of the weights of 746 United States soldiers of 
French extraction was shown in Table 68 to be 16 03 pounds. At the 
time of demobilization, weight measurements were made not only of the 
United States soldiers of French extraction, but of over 80,000 soldiers of 
all types. For the entire group the value of crp was 17.06 pounds. Does 
the observed a for the French differ significantly from this value? The 
17.06 

value of (Ttr = ^ 2(746) ~ *^17 pounds. The a for the French troops was 

1.027 pounds less than that for all troops. Comparing this difference 
with cTff gives 


£ - ^ ^ _ 1-027 _ 

cr CTc ^ .4417 


Since N is large, we refer to the table of areas of the normal curve (Ap- 
pendix E) and conclude that such a difference would rarely occur through 
chance arising from sampling. 

If the value of (Xp is not known, we must substitute a as computed from 
the sample and determine 

O ' 


It will be recalled that cr^ = 
cr<r = 


, therefore 
VN 

-^<r^ = .7071068 
V2 ^ 
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Reliability of <t when N is small. Values of computed from small 
samples are not distributed normally or symmetrically. The distribution 
of sample values of (X^ may be put in the form 


where the distribution of the population is assumed to be normal. Using 
this expression in conjunction with a table of using n = V ~ 1, 

we may ascertain the sampling variation of (X^ or ex, provided we know the 
value of (Xp. Suppose that we have a sample of 10 items drawn from a 
population having (Xp = 8 pounds. What are the limits within which 98 
per cent of the sample cr^s (from samples of iV = 10) would be expected to 
fall? It is apparent from the expression that a high value of is asso- 
ciated with a high value of ^i^d that a low value of cr^ is associated 
with a low value of x^- We therefore determine the value of x^ the 
.01 point and at the .99 point (since these limits include the central 98 
per cent of the values of x^) ^i^d solve the expression above. Referring 
to the x^ table, forn == 9 and F == .01, we find x^ = 21.666 and therefore 

lOcr^ 

82 

1386.624 
138.6624 
11.78 pounds. 


21.666 = 
lOcr^ = 

0r2 = 

O ' = 


Referring to the x^ table, for n = 

2.088 


9 and P = .99, we find x^ == 2.088 and 
IO0-2 


82 

10cr2 ^ 133.632 
(r2 = 13.3632 
O' = 3.66 pounds. 


From the above we conclude that sample cr’s from samples of W = 10 
would fall within the limits of 3.66 pounds and 11.78 pounds in 98 out of 
100 instances. 

The foregoing is useful if we know the value of trp. This will very 
rarely be true xmless we have set up a control population (as is sometimes 
done in manufacturing) and desire to ascertain if samples selected from 
time to time correspond closely with this control group. 

Ordinarily we know only the value of <r and V. When this is so, we 
may revert to the idea of fiducial probability and, using the same expres- 
sion, ascertain the limits within which ap may confidently be expected 
to fall. 
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Let us determine the 90 per cent fiducial or confidence limits of (Tp for 
the hard-drawn copper wire previously referred to (Table 70). The value 

of a was 8.26 pounds, and N was 10. Using the expression 

<Tp 


proceed somewhat as before. At the .05 point, when — 9, the value of 
is 16.919 and 


16.919 - 

16.9190-1 = 682.276 
o-| = 40.3260 
cTp = 6.35 pounds. 

At the .95 point, = 3.325 when n = 9 and 


3.325 - 

cr| 

3.325(7| = 682.276 
cr| = 205.1958 
CTp = 14 32 pounds. 

There is a 5 per cent fiducial chance that <7p is less than 6.36 pounds, and 
a 5 per cent fiducial chance that (Tp is greater than 14.32 pounds. The 
fiducial probability is 90 per cent that dp falls between 6.35 and 14.32 
pounds. 

Now let us ascertain the 98 per cent fiducial limits of ctp for the copper 
wire. At the .01 point, when = 9, the mlue of x^ is 21.666 and 

21.666 

cr| 

21.6660-1 = 682.276 
0-1 = 31.4906 
<rp = 5.61 pounds. 

At the .99 point, x^ = 2.088 when = 9 and 

2.(«8.“<8|<L)^ 

(Tp 

2.088cr| = 682.276 
(r| = 326.7605 
dp = 18.075 pounds. 


There is a 1 per cent fiducial chance that dp is less than 5.61 pounds, and 
a 1 per cent fiducial chance that dp is greater than 18.075 pounds. The 
fiducial probability is 98 per cent that dp lies between 5.61 and 18.075 
pounds. If we need to reduce the fiducial limits of crp, we must study a’ 
larger sample. 
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Considering the expression apparent that, for given 

(Tp 

values of iV, the ratio ^ ^or^^ will always be a constant, as will 

also the ratio — ^or For any given level of fiducial probability and 

any given sample size, it is possible to ascertain this ratio. Since we are 
interested in inferring the value of Cp from a known value of < 7 , we shall 

consider the ratio ^ . Let us call this ratio b ^that is, b == and write 


(Tp = ha. 

Suppose we wish to determine the values of b for the .05 and .95 level 
when = 10 (n = 9). Referring back to the illustration of the hard- 
drawn copper wire, we found that there was a 90 per cent fiducial proba- 
bility that (Tp fell between 6.35 pounds and 14.32 pounds, while <t was 
8,26 pounds. Using 61 for the lower fiducial value, we have 


8.26 


.769. 


Using 62 for the upper fiducial value gives 


^ 14.32 
8.26 


1.734. 


For samples of iV = 10, there is a 90 per cent fiducial probability that crj 
falls between bia and b 2 (T. In similar fashion, the values of bi and 62 for 
samples of various sizes could be determined. A number of these values 
are given in Table 72 and enable us quickly to ascertain the 90 per cent 
fiducial limits of (Tp. 

The values of 63 and 64 for the 98 per cent fiducial limits of (Tp when 
iV == 10 may also be ascertained. We found a 98 per cent fiducial proba- 
bility that (Tp for the hard-drawn copper wire fell between 5.61 and 18.075 
pounds. Then 


h fi7Q 


= 18-075 
8.26 


= 2.188. 


For samples of iV « 10, there is a 98 per cent fiducial probability that <tp 
falls between hac and bi<r. Similarly, values of 63 and 64 may be computed 
for samples of various sizes, and a number of these are given in Table 72 
for the 98 per cent fiducial Hmits of <rp. 
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TABLE 72 

Values of 6 i, 62 , 63 , and 64 for Determining Fiducial or Confidence Limits of 
(Tp FOR Samples of iV = 5 to iV ~ 30 


N 

90 per cent fiducial limits 

98 per cent fiducial limits 

hi 

h2 

ba 

64 

5 

726 

2 652 

.614 

4.103 

6 

.736 

2 289 

631 

3 291 

7 

.746 

2 069 

645 

2 833 

8 

.754 

1921 

.658 

2 541 

9 

.762 

1815 

669 

2 338 

10 

.769 

1734 

679 

2 188 

11 

.775 

1.671 

688 

2 074 

12 

781 

1620 

697 

1983 

13 

.786 

1.577 

704 

1.908 

14 

791 

1541 

.711 

1846 

15 

.796 

1.511 

.717 

1.794 

16 

800 

1.484 

.723 

1749 

17 

.804 

1461 

729 

1.710 

18 

.808 

1.441 

.734 

1 1.676 

19 

.811 

1422 

739 

1.646 

20 

.815 

1.406 

,743 

1.619 

21 

,818 

1.391 

.748 

1594 

22 

.821 

1.378 

752 

1.572 

23 

.823 

1365 

.756 

1553 

24 

826 

1354 

.759 

1.534 

25 

829 

1.344 

.763 

1,518 

26 

.831 

1,334 

766 

1.502 

27 

.833 

1.326 

769 

1488 

28 

.835 

1317 

.772 

1.474 

29 

.838 

1 309 

.775 

1.462 

30 

.840 

1302 

778 

1.451 


Source: Reproduced by permissiou of the British Standards Institution, 28 Victoria Street, London, 
W 7 -n No 600, by E S Pearson, “The Apphcation of Statistical Methods to Industrial 

^r' li*' - )-i d ■ quality Control,” p 69, London, 1935 Copies may be obtained from the American 
't -■ . 29 West Thirty-ninth Street, New York, price $1-75. 

Significance of difference between two standard deviations when 
are large and when Ni = N 2 > For the groups of left-handed and right- 
handed students discussed earlier, it was found that no significant differ- 
ence existed between the two mean l.Q.’s. For left-handed students, ari 
was 15.1; while for right-handed students, cr 2 was 15.4. Is the difference 
between these measures significant? It will be remembered that == 15.2, 

5-2 = 15.5, and iVi = -V 2 “ 68. 

_ ^ , on 

v^2A V136 
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_ _ 

~ ^yW ~ Vi36 “ 

= V< + < = Vl.302 + 1.332 = 1.86. 

The ratio of the observed difference to the standard error of the difference is 


5 _ __ 15.1 — 15.4 _ .3 

cr 1.86 1.86 


Referring to the table of areas of the normal curve, we find that in 872 
instances out of 1,000 we might expect to get a difference of =t=.3 or more 
between the cr^s through the variations of sampling, or that <72 would ex- 
ceed 0*1 by .3 in 436 cases out of 1,000. We conclude that there is not a 
significant difference between the two cr^s. 

It will be apparent from the expressions for <7^ and <7^ that 


<7^ 


1 



X 


.7071068 <7 _ ^ 

- ^2’ 


Referring to page 319, it was found that « for the I.Q.'s of left 

Xi X2 

handed and right-handed students was 2.63, and .7071068 X .263 “ 1.86, 
which is the same as computed above. ^ 

Significance of differences between two cr’s when N^s are small and/or 
Ni ^ N 2 . When Ni and N 2 are both large, or when Ni and N 2 are mod- 
erate values and Ni — N 2 or nearly so, we may proceed as above. It was 
previously noted that the sampling distribution of cr is not normal when 
N is small. To test the significance of differences between standard devia- 
tions, R. A. Fisher has suggested a transformation which is particularly 
useful for small samples and which refers to <ti and d '2 instead of 0*1 and 0 * 2 - 
This transformation is: 

? = (logeO-i - logeCT2) = hge^> 

or 

2=1 (loge o-f - loge 5-|) = I logg ^1- 

<r 

® The standard error of the coefficient of variation {V ~ =) is 

where Y p (expressed as a decimal) refers to the coefficient of variation of the population. 
When - Fa 
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Since adequate tables of natural logarithms are not available, we may 
compute the value of z by making use of the expression 

loge Z = log, 10 • logioX = 2.302585 logio Z. 

Therefore 


s = 2.302585 logio I-S 


or 


2 = 1.15129 logic ?|- 

^2 

Use of 2 : enables us to test the reliability of the difference between o'! and 
5'2 or between and u\ It must be obvious that the two expressions 
are exactly the same; if there is a significant difference between and 5 ' 2 , 
there is also a significant difference between and 
The value of z varies between plus and minus infinity, being negative 
when 


and positive when 




The distribution of z is approximately normal when iVi and N 2 are both 
large, or when Zi and Z 2 are moderate and equal or nearly^ equal.® Unless 


® See LHC. Tippett, The Methods oi Statistics, pp. 117-120, Williams and Norgate, 
London, 1937 (2nd Edition). 

We have studied the reliability of by making use of the distribution of Now 
we shall study the significance of differences between variances by transforming the 
computed values into z values Sometimes we must work with a distribution the exact 
shape of which is not known For any series of values, no matter how distributed, it 
may be shown by Tehehycheff s inequality that the proportion of values lying beyond a 


2? "^1 1 

given symmetrical range of the mean i ~ 's less than That is, P < This 


is an extremely conservative test of rehabihty As is apparent from the expression 


the .05 level of significance (P < .05) is at 4 47cr; the 02 level (P < .02) is at 

7.07<t; the .01 level (P < .01) is at lOcr; and the .001 level (P < .001) is at 31.62<r 
If a distribution is imimodah and if the mode m within a of the mean [that is, if 
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Ni — N 2 j the distribution of z is skewed and, for simplicity of procedure, 
we generally work only with positive values of z, which is accomplished by 
considering the series with the larger standard deviation (or variance) as 
the first series with subscript 1. This makes it necessary to consider only 
the positive half of the skewed distribution of z. The significance of z 
depends upon the value computed for z and also upon ni(= iV'i — 1) and 
^ 2 (= -^2 — 1). To present a reasonably complete table of the distribu- 
tion of z would require many pages. However, Fisher, has prepared tables 
for P = .05, P = .01, and P = .001 points for selected values of ni and ?^ 2 . 
These are shown as Appendix Gl. 

As an illustration let us compare the variance of length of the two 
groups of lower first molars previously mentioned. Notice that we are 
comparing o-f and rather than erf and (r|. 

For the molars excavated at the first site: 


2x1 = 8.2944. 

m = iVi - 1 = 16 - 1 = 15. 


;r2 


8.2944 ^ 
15 


For the molars excavated at the second site: 

Sx| = 3.4596. 

^2 = iV2 - 1 9 - 1 = 8. 


^2 


3.4596 

8 


= .432. 


The computation of z is based upon the foregoing. 

0 ^ 1.15129 logio .553 - 1.15129 logio .432 
= 1.15129(9.742725 - 10) - 1.15129(9.635484 -- 10) 
- .1235. 


skewness as measured by (mean — mode) <r < ± 1] we may apply the Camp-Meidell 


inequality j which states that less than 



of the items 



lie beyond 


a given symmetrical range of the mean ±^. Upon this basis, the 05 level (P < 05) 

is at 2.98cr; the ,02 level (P < .02) is at 4.71cr; the .01 level (P < .01) is at 6 67cr; anci 
the .001 level (P < .001) is at 21 08cr. See W. A. Shewhart, Economic Control oj 
Manufactured Product^ pp 175-176, D. Van Nosbrand Co., New York, 1981; and B H. 
Camp, The Mathematical Part of Elementary Btathtics, pp 256-257, D. C. Heath. 
Boston, 1981. 
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Alternatively, 

2 = 1.15129 logic ~ 

= 1.15129 logic 1.2801 
= .1235. 

Referring to the z table (Appendix Gl), for ni == 15 and 712 = 8, we find 
that a value of z of about .58 falls at the .05 point, while a value of z of 
about .85 falls at the .01 point. Consequently the chances of obtaining a 
value of z = .1235 are appreciably greater than .05, and it appears that 
there is not a significant difference between the two variances. It must 
be apparent from the expression 



that the value of z depends upon the ratio of o-f to o'! and not upon the 
absolute value of either variance. For example, if d-f = 4 and == 2, 
then z = ^ loge2 = .69315. Similarly, if crl = 21.6 while d* |== 10.8, the 
value of z = ^ log^ 2 = .69315, as before. For this reason it is possible to 

recast the table of 2 : in Appendix Gl, and state it in terms of F = as 

^2 

shown in Appendix G2. The larger variance should always be in the 
numerator when we use this table. The F table is a more conven- 
ient table to use since it eliminates the necessity of looking up log- 
arithms. If we wish to use this table for testing the significance of 

d*^ 

the difference between ai and d' 2 , we determine F — or we may com- 

^2 

pute ~ and square the resulting figure before entering the table. 

0'2 

The z test may, of course, be used when the are large. Let us 
apply it to the data of I.Q.’s of right- and left-handed students previously 
discussed. For 68 left-handed students, we found d*! = 15.2; for 68 right- 
handed students, 0*2 = 15.5. We can now proceed as before when com- 
puting Zj and interpolate in Fisher^s table for ni = 67 and 712 = 67. But 
the sampling distribution of z is approximately normal when Ni and iV '2 
are both large, or when Ni and N 2 are moderate and equal or nearly so. 
It is therefore substantially accurate in the present instance to find the 
ratio of z to cr^j and interpret by reference to the normal curve. Since 
the distribution of 5 : is approximately normal and not skewed as in the 
preceding illustration, it is not necessary that z be positive. 

2 : ^ 2.30259 logio 15.2 - 2.30259 logio 15.5 
= 2.30259(1.181844) - 2.30259(1190332) 

- .01954. 
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However, if we should compute z from the expression 

« = 2.30259 logio 

(T2^ 

it would be necessary to consider the larger d" as d*!. 
The value of cfz is given by: 


(Tz 


The ratio 

(Tz 

is the same as that obtained when comparing cri and cr 2 on page 344, and 
indicates no significant difference between d"! and ^ 2 - 

An Application of Reliability Measures 

Control of quality of manufactured product may involve the adherence 
to a predetermined standard in order to maintain at all times a high de- 
gree of uniformity of product. Suppose a manufacturer wishes to control 
his production of |-inch bolts so that, among other characteristics, the 
breaking strength will not be less than 6,000 pounds. He cannot, of 
course, maintain an entirely uniform product. Causes that affect the uni- 
formity of tensile strength are variations in the carbon content of the 
steel, in impurities such as sulphur and phosphorous, in homogeneity of 
structure, in conditions of heat treatment, in the diameter of the rod stock, 
and in the manufacturing process. If each of the causes of variation con- 
tinues to have the same probability of contributing a given effect, then the 
breaking strength may be said to be controlled in a technical sense. Thus, 
by keeping the conditions of manufacture under control, an acceptable and 
relatively uniform product may be assured. 

If the cost of testing each bolt separately is prohibitive, or if the test is 
destructive, it will be necessary to resort to sampling in order to ascertain 
whether quality and uniformity are being maintained.'^ In order to estab- 


7 Except that in certain instances an associated characteristic may be tested, as foi’ 
example, hardness may be used as an indicator of tensile strength See F. E. Croxton 
and D. J. Cowden, FracHcal Business Btatishcs^ pp. 405-416, Prentice Hall, Inc., New 
York, 1934. 


+ -) 

\ni n2/ 


-Hi 

= V-01493 = .1222. 


.01954 

.1222 


= .16 
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lish statistical control, it is obviously necessary to set a standard level for 
the average tensile strength of bolts. The first step, then, is to determine 
the arithmetic mean and the standard deviation of tensile strength of such 
bolts produced under standard conditions, and these measures are taken 
as the mean and the standard deviation of the entire population. If not 
more than about 1 bolt out of 1,000 is to be less than 6,000 pounds in 
tensile strength, it is necessary that the manufacturing process be estab- 
lished so that the average tensile strength will be larger than 6,000 by an 
amount equal to at least three standard deviations, since the tail of a 

normal curve beyond — 3cr contains about 1^7^ of the area. 

lUuu 

Assuming that ctp = 320 pounds, Zap . = 960 pounds 

Lowest permissible value for tensile strength = 6,000 pounds 

Required Xp == 6,960 pounds. 

It is now possible to estimate the limits within which the means of any 
given proportion of the samples of size N should fall if drawn from the 
standard universe of bolts. These limits are, of course, Xp ^ ctx for the 
means of 68.27 per cent of the samples, and X =t= Zcx for 99.73 per cent 
of the sample means. If each sample is to be of 4 bolts, 

cr^ = —pz = 160 pounds. 

V4 

Since we have dp instead of o', we make use of the areas under the normal 
curve rather than the t curve. 

Therefore, 99,73 per cent of the means of samples of 4 bolts each should 
vary between 6,960 3(160), and only about 1 out of every 1,000 will 

be less than 6,960 — 3(160) = 6,480 if they are drawn from this universe. 
If a sample mean is less than 6,480, as in Chart 136, this result indicates 
that it was probably not drawn from the standard universe, and that 
therefore lack of control exists. The cause of the difficulty should be 
traced. Had the sample been of 16 rather than of 4, the allowable limit 
for a sample mean would have been only half as far below the standard 
mean, since the reliability of a sample increases as the square root of the 
number of items included. Engineers have adopted the 3cr limits as an 
indication of lack of control, not so much because of the statistical proba- 
bility value of 99.73 per cent as because these limits have proved to be 
satisfactory and economic in practice. 

Samples indicate probability only, not certainty. It is quite possible 
that lack of manufacturing control might exist without being brought to 
light by this procedure- It is even possible that a sample mean might 
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Chart 136, Arithmetic Means of the Tensile Strength of Successive Test Samples 
Each of Four f-Inch Bolts. (Based on a chart from H. F. Dodge, ^^Statistical Control 
m Sampling Inspection,’^ American Machimstj October 26 and November 9, 1932.) 


STANDARD DEVIATION 



Chart 137» Standard Deviations of the Tensile Strength of Successive Test Samples 
Bach of Four I'-Inch Bolts. (Based on chart from H. F. Dodge, ^^Statistical Control 
in Sampling Inspection,” Arnerican Machinist^ October 26 and November 9, 1932.) 
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be well above the control limit and yet that individual bolts in that sample 
might have a tensile strength of less than 6,000 pounds; this could easily 
be true if the variation within a sample was large. To further assure 
satisfactory quality as well as a uniform iproduct, it is desirable also to 
set up in analogous fashion control limits for the standard deviation of 
samples and to record the results of successive samples as in Chart 137.^ 

Analysis of Variance 

The analysis of variance provides a basis for comparing not only two, 
but any number of series simultaneously. 

In Table 73 data are shown of the length of English cuckoo eggs which 
were deposited in the nests of three different species of birds. The cuckoo 
makes a practice of permitting other birds to hatch its eggs and rear its 
offspring. We are interested in knowing if the lengths of cuckoo eggs 
laid in the nests of the tree-pipit, the pied wagtail, and the wren show 
significant differences. We could, of course, compute the three means 
and compare the first mean with the second, the first with the third, and 
the second with the third. This, however, does not measure the variabil- 
ity of the three groups as a whole. 

Following the concept mentioned in Chapter IV when constructing line 
diagrams, we shall regard the type of nest as the X variable, or indepen- 
dent variable, and the length of eggs as the Y variable, or dependent vark 
able. There were fifteen eggs measured from each of the three types of 
nests, a total of 45 measurements as shown in the table. The total varia-^ 
tion present in the entire series of measurements is measured by the sum 
•)f the squared deviation of each measurement from the mean of all 45 
measurements. Thus the total variation is 

S 2/2 - 2 (7 7)2. 

The variation within the groups »s measured by considering the deviation 
of each measurement from the mean of its group; these deviations are 
then squared and summed. Letting 7i refer to the mean of the first 
column, and 72 and Ys the means of the succeeding columns, we have: 


« This application of probability theory is summarized from a paper presented by 
H. F. Dodge of the Bell Telephone Laboratories on “Statistical Control m Sampling 
Inspection, at the annual meeting of the American Society for Testing Materials^ 
Atlantic City, June 20, 1932, and published in American Machinist, October 26 and 
November 9, 1932 See also W, A. Shewhart, Economic Control of Quality of Ifaru- 
factured Product, D. Van Nostrand Co., New York, 1931; and 1933 A. S, T. M. Manual 
on Presentation of Data, 1933, with Supplements A (Presenting Plus and Minus Limits 
of Uncertainty of an Observed Average) and B (“Control Chart” Method of Analysis 
and Presentation of Data), 1935. American Society for Testing Materials, Philadelphia, 
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TABLE 73 

Length op Cuckoo’s Eggs Deposited in the Nests op Three Species op Birds 


Order of 
measure- 

ment 

Tree-pipit 

Pied Wagtail 

Wren 

Length m 
millimeters 
Fi 

F? 

Length in 
millimeters 

Y2 

F| 

Length in 
millimeters 
Fs 

n. 

1 

22 7 

515 

29 

23 0 

‘ 529 00 

19.8 

392 04 

2 

23 3 

542 

89 

23 4 

547 56 1 

22 1 i 

488 41 

3 i 

24 0 

576 

00 

i 24 0 

576 00 

21 5 

462.25 

4 

23 6 

556 

96 

23 3 

542 89 

20 9 

436 81 

5 

22 1 

488 

41 

23 1 

533 61 

22 0 

484 00 

6 

218 

475. 

.24 

22 4 

501 76 

210 

441 00 

7 

21 1 

445 

21 

218 

475 24 

22 3 

497 29 

8 

23 4 

547 

56 

218 

475 24 

210 

441 00 

9 

' 23 8 

566 

44 

24 9 

620 01 

20 3 

412 09 

10 

23.3 

542 

89 

24.0 

576.00 

20.9 

436 81 

11 

24 0 

576 

00 

22 1 

488 41 

22 0 

484 OC 

12 

23 5 

552 

25 

210 

441 00 

20 0 

400 Oe 

13 

23 2 

538 

24 

22 6 

510.76 

20 8 

432 64 

14 

24 0 

576 

00 

21.9 

479 61 

212 

449 44 

15 

22 4 

501 

76 

24.0 

576 00 

210 

1 441 00 

Total 

346.2 

SFi 

8,001.14 
SFf j 

343.3 

2)3^3 

7,873 09 

SFf 

316 8 

21^3 

6,698.78 

SFi 


Source Oswald H Latter, “The Egjc of Cuculua Canorus,” Biometnka, Vol I, pp 164-176. 

SF = 346 2 -f 343.3 + 316 8 = 1,006 3 
(27)2 = (1,006 3)2 = 1,012,639.69 
SF2 = 8,001 14 + 7,873 09 6,698.78 = 22,573 01 

m {Nk \2 

2 ( S Fj = (2Fi)2 + (2Fa)2 + (2F3)2 

= (346 2)2 + (343 3)2 + (316 8)2 - 338,071.57. 

Degrees of freedom: 


Between nests 

(columns) . . 

., 2 

Nk * 15 

Within nests 

(columns) . , . 

, . 42 

Y =45 



— 

m — Z 


Total,,. , 

... 44 
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For the first column: 
For the second column: 
For the third column: 


2 (F - 7iy 

I 

2 (F ~ 72^ 
1 

2 (F ~ Fs)^ 


For all columns: 


2 (F 
. 1 




vhere Nk represents the number of items in a column; 
Yk represents the mean of a column; 

2 indicates a summation for a column; 


m 


2 indicates a summation of the m columns; 

1 

2 indicates, as before, a summation for the entire series. 


The variation between the groups is measured by referring to the deviSi- 
tion of each group mean from the grand mean (the mean of all the data) ; 
these deviations are squared, multiplied by the number of items in the 
group, and summed. Thus: 

For the first column: iV'i(Fi — F)^ 

For the second column: N 2 {Y 2 — F)^ 

For the third column: iV' 3 (F 3 — F)^ 

W P * 

For all columns: 2 Nk(Yk — F)^ 

1 L J 


It is shown in Appendix that 


m riVjj. 

S (7 - F)2 = 2 2 (F - 
1 Li 



+ S 
1 


Nk (7jb 



or, in other words, that total variation - variation within columns var^ 
iation between columns. 

If we divide the variation within columns by the degrees of freedom 
present (14 + 14 + 14 = 42, in this case), we obtain a measure of the 
variance^^ within columns. If we divide the variation between columns 


® Appendix B, section Xin-2. 

In computing yanance in this discussion, we always use degrees of freedom. Thus 
all variances are estimates of population variance rather than statements of the sampk 
variance. In the later chapters on correlation, we shall mahe use of measures of vari^ 
ance basea upon N rather than rh. 
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by the degrees of freedom (2 in this instance); we obtain a measure of 
the variance between columns. Now the variance within columns (within 
nests) is clearly a chance variation. If the variance between columns 
(between types of nests) exceeds the former significantly, then there is a 
significant difference in length of eggs in these three types of nests. Or, 

it may be said that the total variance in length of eggs is partly ex- 

plained by the types of nests in which they are found, while the variance 
within nests is a chance variation because it has not been explained, nor, 
m fact, did the hypothesis even attempt to explain it. 

mrN^ 

The variation within columns S 2 (F — Yk)^ could be computed by 

1 Li J 

determining the mean of each column, taking the deviation of each of the 
45 measurements from the appropriate column mean, squaring, and add- 
ing. In computing cr, we developed a short method which made it un- 
necessary to work with deviations. A similar device may be employed 
here, and in Appendix B (section XIII-2) it is shown that 


m 


S 

1 


S (7 - Tk)^ 
1 


]- 


SF2 - 


m 

S 2 
1 \ 1 


Nn 


if there is the same number of items in the various columns. Referring 
to the computed values shown below Table 73, we find 


S72 _ 



22,573.01 


338,071.57 

15 


= 22,573.01 - 22,538.10 
= 34.91. 


This fiigure is entered in Table 74 as the variation or sum of the squared 
deviations within columns. The mean variance or merely variance vnthzn 
columns, as it is usually termed, is obtained by dividing this figure by 
the degrees of freedom. Since there are 15 items in each column and 
since the squared deviations were taken in reference to the mean of each 
column, there are 14 degrees of freedom in each column or 3 X 14 = 42 
degrees of freedom within columns. The variance within columns is 
34 91 

= .8312, which is also shown in Table 74. 


m 

The variation between columns 21 

1 




may also be obtained 
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TABLE 74 

Analysis op Variance op the Length of Cuckoo’s Eggs Deposited in Nests op 

Three Species op Birds 


Source of variation 

1 

Variation or 
sum of squared 
deviations 

Degrees 

of 

freedom 

Variance 

Within nests (i e , within columns of 




Table 73) . . 

34 91 

42 

8312 

Between nests (i e , between columns 




of Table 73) , ... 

35 00 

2 

17.50 

Total 

69 91 

44 

. 


The total variation was computed from 

S(y - Yf = 


27* - 


(JYf 

N 


See Appendix B, section XIII~2 Total variation is shown in this table for checking 
purposes, since it is the total of the two figures above it Likewise, total degrees of 
freedom (A — 1) is the sum of the two figures preceding 


- 


(STT 


= 22,573 01 - 


1,012,639.69 


N 45 

- 22,573.01 - 22,503 10 

- 69 91. 

Total variance, however, is not the sum of the other two variances. It may be com 
puted by dividing total variation by total degrees of freedom- Thus 

^ = 1 - 589 . 

44 


without computing means or deviations. It is shown in Appendix 
that 


2 Nk (^k 
1 L 



(SYf 

N 


if there is the same number of items in each column. Again referring to 
Table 74, we have 

m \2 

? if (S7)" _ 338,071.57 1,012,639.69 

Nk N 15 45 

= 22,538.10 - 22,503.10 
= 35.00. 


Appendix B, section XIII-2. 
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This figure is entered in Table 74 as the variation or sum of the squared 
deviations between columns. The variance between columns is now ob- 
tained by dividing this figure by the degrees of freedom. There are 2 
degrees of freedom between columns since the 3 column means were con- 
sidered in relation to the grand mean, thus losing 1 degree of freedom 

The variance between columns is == 17.50, which is also entered jn 

Table 74. 

The variance within nests (columns) is .8312, while the variance between 
aests (columns) is 17.50. As previously mentioned, the variance within 
nests (columns) can logically be considered as due to chance causes. If 
the variance between nests does not exceed the variance within nests sig- 
nificantly, we may conclude that the variance betv/een nests also is due 
to chance. The variance between nests is much greater than the variance 
within nests and hardly needs to be tested for significance The test which 
is used is the same z test previously applied to two values of when 
A'l iV’ 2 . It will be remembered that the larger variance appears in the 
numerator in the expression for 2 :, in order that the value of z will be 
positive. 

3 = 1.15129 logic — ^ = 1.15129 logic 21.05 
= 1.52345. 


Referring to Appendix Gl, mentioned before, and using degrees of free- 
dom ni — 2 and n 2 = 42, we find that z == 1.52345 lies beyond the ^ of 
one per cent point and the difference is clearly significant. The variation 
of egg length with type of nest is therefore almost certainly real.^^ Com- 


,. „ 17.50 

putmg^’=^ 


= 21.05 and referring to Appendix G2 for ni = 2 and 


n 2 ~ 42, the conclusion is, of course, exactly the same as that arrived at 
by use of the z test. 

In Table 75, aata are given of the strength of the lead in five pencils 
made by a company identified only as ^^Company Four tests were 
made of the lead of each pencil. We are interested m knowing if the 
variance between pencils is significantly greater than the variance within 
pencils, in order to find out whether or not there is uniformity of strength 
from pencil to pencil. 


Compare with Tippett (The Methods of Statistics, pp, 132--134, 2iid Edition), who 
3omes to the same conclusion using 6 nest-types with Nk ranging from 14 to 45. 



Strength of Lead in Number 2 Pencils Manufactured by ^^Coisipany 
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Variation within pencils (columns) is 

m \ 2 


sy2 


Nk 


= 62.3517 - 


247.8193 


= 62.3517 - 61.954825 
= .396875. 

Variance within pencils (5X3 degrees of freedom) is 

.396875 


15 


= .02646. 


Variation between pencils (columns) is 

m /N^ \2 

? \? V (Sy)2 _ 247.8193 (35.17)2 

Nk N ^ 20 

= 61.954825 - 61.846445 
= .108380. 

Variance between pencils (4 degrees of freedom) is 

= .027095. 

4 

These figures are summarized in Table 76 together with the figures for 
total variation and total degrees of freedom for purposes of checking. It 


TABLE 76 


X 


Analysis op variance of Strength op Lead in Number 2 Pencils Manufactured 

BY ‘‘Company 




Source of variation 

Variation or 
sum of squared 
deviations 

Degrees 

of 

freedom 

Variance 

Within pencils 

.396875 

15 

026458 

Between pencils 

.10838(> 

4 

027095 

Total 

.505255 ' 

19 



Total variation is computed from 

(S7)2 


- 


N 


1236,9289 
= 62.3517 -'—^ 

= 62 3517 - 61-846445 


- -505255. 
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i^ill be observed that the two variances are very nearly alike. Making 
the z test, 

z = 1.15129 logio — ^ = 1.15129 logio 1.024 
= .01186. 

n\ (degrees of freedom between pencils) is 4, while n% (degrees of freedom 
within pencils) is 5 X 3 = 15. Consulting Appendix Gl, it appears that 
when ni = 4 and = 15, a value of ^ = .5585 falls on the .05 level of 
significance. Since the z obtained above is much less than this, it cannot 
be said that there is a significant difference between these variances. In- 

stead of using z, we may use F = = 1.024, m = 4, nz = 15, and 

consult Appendix G2. The conclusion is identical. 

If the analysis of variance had shown a lack of uniformity between 
pencils, it must be apparent that a condition of general non-uniformity 
was indicated. The lack of uniformity between pencils might have been 
due to a large defection on the part of one or two pencils, while the others 
might have been relatively um'form. We could, of course, compare the 
mean of each pencil with each other one, as previously mentioned, and 

thus perhaps learn something more specific about the location of the 

non-uniformity. 

In the foregoing paragraphs the analysis of variance has been applied 
only to distributions in which Ni — Nfi - Nz = etc. This is not a neces- 
sary condition and the formulae are but slightly altered ii Ni 9 ^ N 2 ^ 
Nz 9 ^ etc., as is shown in Appendix B, section XIII-2. 

Further considerations of the analysis of variance will be found in the 
works of Fisher, Snedecor, and Tippett listed at the end of this chapter. 

Criterion of Likelihood 

Comparison of several </s. Manufacturing concerns that are interested 
in maintaining uniformity of a product may find it necessary to compare, 
not merely two cr's or cr^^s. but a mnnber of measures of variance (or uni- 

ig to observe that, when there are two columns, the expression 

, or the variance within columns, is equivalent to or 

Ni-^N2 - 2 

the estimate of crp based on two samples assumed to have oeen drawn from the same 
population, used in the preceding chapter. Furthermore, when only two categories 
are being considered, the significance of the variance between the two means (as given 
by the z or F test) is the same as the significance of the difference between the two 
means (as given by the i test). Both tests attempt to ascertain if the two samples 
were drawn from the same ponulation. 
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formity) from samples of their product selected periodically or from each 
lot produced. 

One method of comparing a series of sample cr's would be to compare 
<71 with (72j <7i with (73, (7% with <73, etc. Another procedure involves com- 
paring all of the (x’s at once^^ by means of a criterion of likelihood, 

I, = X (r| X g-j X - • X q-g 

I (o’! + + • ■ • + 

where k is the number of samples. When the samples are of varying 
numbers of items, 

r _ X X X • X (crf)^ 

^ 1 

^ {Nial + N2(7l + N3(7l + • • • + NM) 

where Ni, N 2 , etc., are the number of items in the respective samples and 
N' ^ Ni + N2 + N3 + -^ +Nk> 

The numerator is the geometric mean of the cr^’s, while the denominator 
is the arithmetic mean of the <r^^s. If there is any difference between 
the various the numerator will be smaller than the denominator. If 
all of the (7’s are identical, there will be a condition of maximum uniformity 
and L = 1. The value of L has 0 as’ its lower Hmit, which represents a 
condition of maximum non-uniformity. This is a theoretical limit, which 
would not be approached in actual practice. 

Let as compute the value of L for tests of strength of five pencils shown 
in Table 75. The first step is to compute the value of d-f, the variance of 
the first pencil. We use o-f instead of erf because the samples are of 
iV = 4. 

53 = 

1 Ni-l Ni(Ni - 1) 

_ 11.9420 (6.90)2 

3 4(3) 

= 3.980664 - 3.967500 
= .01316. 

In similar fashion we obtain; 

- .05667, = .01930, 

o-| = .02787, = .01529. 


i*See J. Neyman and E. S. Pearson, “On the Problem of k Samples,” Aiademija 
Omiejetnosoi, BnUetin Intematicmal de VAcaMmie Polonaise des Sciences et des Lettres 
Sdne A. Sciences MathSmatiqnes, 1931, pp 460-481 
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^ _ '^.01316 X .05667 X .02787 X .01930 X .01529 
L 

^(.01316 + .05667 + .02787 + .01930 + .01529) 


Using logarithms to determine the fifth root of the products of the five 
variances, we have: 


.02278 

02646 


and it appears that there is uniformity among the variances of the five 
pencils tested. 

Similar tests were made of pencils manufactured by five other companies. 
The product of one company showed L = .92, while at the other extreme 
another company’s product gave L = .30. 

It may be desirable to test the reliability of L. This is facilitated by 
referring to Nayer’s tables, which are shown in Appendix H and which 
are constructed upon the assumption that the various samples have been 
drawn from a normal population. As noted before, X = 1 if all values of 
5^ are identical. We therefore set up the hypothesis that af == o'! = • • • = af, 
and ascertain whether a value of L such as that observed might occur 
by pure chance. It should be noted that the curve of the sampKng distri- 
bution of L has its maximum ordinate at 1.0 and slopes downward (con- 
cave upward) to X == 0. It depends upon N (the number in a sample) 
and fc. In the above instance we found L = .86^ W = 4, A == 5. Prom 
Appendix H it is observed that X = .491 at the .05 level, and .370 at the 
.01 level. We conclude that the value obtained for X is not significantly 
less than 1; therefore our hypothesis is not impugned, and X = .86 for 
these pencils indicates real uniformity. Consider now the pencils manu- 
factured by another firm which showed the least uniformity and for which 
i — .30. Here also iV = 4, A = 5. Since the value of X = .30 is be- 
yond the .01 level, the hypothesis of equality of variance of these pencil? 
is very dubious, and we conclude that this product is clearly not uniform 
in regard to strength. 

In this and the preceding section we have examined two measures of 
uniformity: (1) the analysis of variance which failed to indicate lack of 
uniformity of means from pencil to pencil, and (2) the coefficient of likeli- 
hood which has indicated uniformity of the variance within pencils. Ob- 
serve that we have not considered whether or not this brand, designated 
“Pencil D,” is stronger than some other make of pencil. As a matter of 
fact, it was significantly less strong than one other of sis brands tested, 
data of which are not included in this volume. We do know, however, 
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that this brand possesses uniform strength, both within pencils and from 
pencil to pencil. 

In these two chapters on reliability and significance we have discussed 
at some length the reliability of the arithmetic mean, of a percentage, 
and of the standard deviation, and also the significance of differences of 
such measures and of the variance between samples. In Chapter XI and 
in this chapter we made use of P values based upon the samphng distri- 
bution of statistical measures computed from samples are subject 

to sampling variation. In the later chapters dealing with correlation we 
shall discuss the reliability of various forms of the correlation coefficient. 

Selected References 

(Note: A number of references concerning have been given at the end of 
Chapter XI.) 

R. A. Fisher: Statistical Methods for Research Workers (Seventh Edition), Chapters 
VII, VIII; Oliver and Boyd, Edinburgh, 1938. Intraclass correlation and the 
analysis of variance. 

R. A. Fisher and F. Yates: Statistical Tables for Biological , Agricultural ^ and 
Medical Research; Oliver and Boyd, Edinburgh, 1938. 

A. B. Hill: Principles of Medical Statistics j Chapters IX, X; Lancet Limited, Lon- 
don, 1937. Use of x^ with applications to medical data. 

F. C. Mills: Statistical Methods Applied to Economics and Business (Revised Edi- 

tion), pages 473'-474, 490-500; Henry Holt and Co., New York, 1938. Sig- 
nificance of difference between proportions and analysis of variance. 

P. R. Rider: An Introduction to Modern Statistical Methods j pages 81-83, 117-119, 
132-150; John Wiley and Sons, New York, 1939. Significance of difference 
between proportions and analysis of variance. 

G, W. Snedecor: Calculation and Interpretation of Analysis of Variance and Co-- 

variance; CoUegiate Press, Ames, Iowa, 1934. 

G. W. Snedecor: Statistical Methods Applied to Experiments in Agriculture and 
Biology^ Chapters 10, 11; Collegiate Press, Ames, Iowa, 1937. Analysis of 
variance. 

L. H. C. Tippett: The Methods of Statistics (Second Edition), pages 84-86, 117-121, 
Chapter VI; Williams and Norgate, London, 1937. Rehability of and signifi- 
cance of difference between standard deviations (variances) : criterion of likeli 
hood, analysis of variance. 



CHAPTER XIV 

THE PROBLEM OF TIME SERIES 


The Problem Stated 

Economists are interested in two types of problems. First, it is pos- 
sible to make an analysis of the situation which would logically exist if 
economic forces were in a state of equilibrium with, no changes taking 
place. Starting with this ideal situation, we may then assume certain 
changes, and the new equilibrium which will finally result may be de- 
scribed. Thus, if the demand for a commodity were to increase a speci- 
fied extent, how much would production expand, and how much would 
the price change? Such an analysis is usually referred to as static. A 
second type of analysis, often referred to as dynamic, which has engaged 
the attention of economists aims to explain what happens while the system 
is tr3dng to reach equiUbrium, rather than the situation that exists after 
this state is achieved. 

It is but natural that economists with a scientific bent should devote 
much energy to the second type of analysis. Such analysis is concerned 
with the behavior of time series. Having undertaken investigations along 
this line, it is not surprising that tools would be developed especially suited 
to their purposes, and so we find that economics is largely responsible for 
the development of statistical methods for analyzing changes taking place 
over time. These methods are quite distinct from, though closely related 
to, frequency distribution analysis. Although the technique of time series 
analysis has been developed largely by economic statisticians, the study 
of time series is of interest to a wide range of people including businessmen, 
sociologists, biologists, doctors, and public health workers. 

Characteristics of Time Series 

Economists are not in complete agreement concerning the meaning of 
the various movements constituting time series, or the proper methods 
for their analysis. But even though the classification and the explanation 
of some movements be in doubt, certain characteristics of time series are 
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apparent upon very brief inspection. The movements which we shall 
consider in some detail are secular trend; cyclical, periodic, and irregular. 

Secular trend. The gradual growth over a period of decades is perhaps 
the most striking characteristic of most industries. This is to be expected 
in a country such as the United States, with steadily growing population. 
But industrial growth is not completely accounted for by population 
changes; this is indicated by Chart 138. The natiual sciences have been 
applied to industry and agriculture so as to increase their output enor- 
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Chart 138. Index of Physical Production in the United States, 1870-1937, and 
Population of Continental United States, 1870-1940. (Production Index from Research 
Department of Federal Reserve Bank of New York. Population data from United 
States Bureau of the Census, Abstract of the Fiftemth Census of the United Stales, 1930, 
p. 9. 1940 population estimate from Table 88.) 


mously. Keeping pace with these technological changes and induced by 
them have been changes in business organization and methods. The 
growth of the corporation has permitted the accumulation of sufficient 
capital for specialization and mass production, while scientific management 
and personnel management have found their way into many organizations. 

In addition to these factois which affect the growth of all industries, 
we find that some industries wax or wane because of changes in demand 
New commodities may attract favor and replace old ones fulfilling a sim- 
ilar need, as the automobile did the horse and buggy. No doubt the 
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automobile has also drawn purchasing power away from commodities 
catering to quite different desires. Demand may also be drawn off, 
though desire may be undiminished, through the appearance of a less 
attractive, though cheaper, substitute. Thus rayon is partially replacing 
silk. More spectacular is the development of the railroads, forcing into 
obsolescence many canals, only to have their traffic more recently diverted 
by competition of trucks, buses, and airplanes. Although physical pro- 
duction in the United States as a whole seems to have been increasing, 
until very recently, at a fairly constant rate, many industries, such as 
pig iron production, shown with a logarithmic vertical scale in Section A 
of Chart 139, seem to be characterized by a rather steadily declining rate 
of growth. This may be due to a combination of a number of reasons. 
Improvements in the productive process are rapid at first, but as time 
goes on it is possible that further improvements have less and less effect 
upon the output. Again, growth may be retarded by the increasing dif- 
ficulty of obtaining supplies, such as minerals, for mining becomes more 
and more difficult as the better ores are exhausted. Further, during the 
period while difficulty is experienced in keeping up with demand, profits 
will be high and it will be easy to expand productive equipment. But 
after a while recourse must be had to the open market for funds. Even- 
tually the point wiU be reached where funds, though forthcoming in large 
amounts, are small relative to the size of the business. Finally, as con- 
sumer desire in old markets becomes more nearly satisfied, relative to 
that for other commodities, it becomes increasingly difficult to entice 
buyers from competing products, and new markets may not appear. 

Many authorities think that not only does the rate of growth decrease, 
but eventually further expansion will be physically impossible. Raymond 
B. Prescott has characterized the tendency we have described as a ^law 
of growth,^^^ which applies to all industries. This law embraces four 
stages: (1) period of experimentation during which the amount of growth 
is small; (2) period of growth into the social fabric; (3) period during 
which growth is retarded as saturation point is approached; (4) period of 
stability. Section B of Chart 139 (on arithmetic paper) indicates that 
the trend of pig iron production answers this description also. Although 
it is quite apparent by inspection of Sections A and B of Chart 139 that 
a trend with a decreasing rate of increase may be on^ ivhich varies in 
amount of growth, as described by Prescott, nevertheless the former type 
need not decline in amount of growth during its latter stages, nor need a 
curve which in its early stages is growing by increasing amounis be also 
one which is increasing at a docreadng raie, 

^ ''Law of Growth in Forecasting Demand/' by Eaymond B. Pr^cott. Journal of 
the American Statidical Association, December 1922, Yoi. XVIII, pp. 471-479 





Chart 139. Pig Iron Production in the United States, 1860-1937, and Secular Trend : 
A. Logarithmic Vertical Scale; B. Arithmetic Vertical Scale. [1860-1918 data from 
l^nterior Department, United States Geological Survey, Mineral Resources of the TjYi'ited 
States, 1918, Part I (Metals), p. 566. ^ 1919-1937 are annual totals of monthly produc' 
tion; source, The Iron Age, as quoted in Standard Statistics Co., Standard Trade and 
Secunties, Basic Statistics, Vol. 80, June 5, 1936, p. G-5, and Current Statistics, Vol. 90, 
October 14, 1938, p. 23.] 
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There is ^ difference of opinion whether price level changes may be 
said to have a trend. Probably, however, it is useful to study both long 
time and short time changes in price. The former are due to such factors 
as variations in the stock of gold and in coinage laws, and to changes in 
business methods, in personal monetary habits, and in banking technique. 
The latter are subject to quite different influences. Thus, although trends 
in price are the result of factors other than those affecting production 
trends, and therefore behave differently from them, they are both long 

time changes and both are norms around which other movements fluctu- 
ate. 

The statistical problem is, first, to decide what type of trend fits the 
data closely and is a logical description of them, and, second, to fit the 
trend of the type decided upon. Such a trend is not only an expression 
of normaP tendencies; it also provides a base from which to measure 
deviations. 

Cyclical movements. Business cycles are a type of fluctuation lasting 
longer than one year that tend to recur with a measure of regularity in 
economies organized on a business basis. Such movements are called 
cyclical rather than periodic because they do not occur with complete 
regularity as to duration. On the other hand, they are cyclical rather 
than random movements because the position of business in the cycle is 
affected by the position of business in recent months and, in turn, affects 
business in the more immediate future. In other words, the transition 
from a low point to a high point, or vice versa, is a progressive develop- 
ment. Cycles appear to operate somewhat on the principle of a pendulum. 
Just as a pendulum is puUed by gravity toward a vertical position, but 
tends constantly to move past its position of equilibrium, so it is said that 
business is drawn toward an equilibrium by the forces of demand and 
supply, and so also do the errors in one direction tend to progress into 
errors in the opposite direction. Such an explanation of business cycles 
is known as the ^^self-generative theory, usually associated with the name 
of Wesley C. Mitchell. But just as the mechanism impelling a pendulum 
must be wound up occasionally, so it is possible that economic activity 
would attain equilibrium were it not for other propulsions of varying 
degrees of intensity. It is possible to speak of cycles in general business 
or of cycles in particular industries, such as residential con|truction, cattle 
raising, or textile production. Occasionally cycles in a specific industry 
appear to be inherently periodic, as in the case of the two-year cycles in 
rayon consumption. In any event they are modified by the position of 

2 The reader should not confuse the statistical meaning of the word “normal/^ which 
is a boH of average, with another meaning of the same word, as used in theoretical 
economic analysis, to designate the situation which would exist if the economic system 
were in equilibrium. 
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1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 

Chart 140. Cyclical Movements of Pig Iron Production, and Federal Reserve Bank of New York Index of Production and Trade, 1919 - 
1037. (For source of pig iron data see Chart 139; trend and seasonal influences have been removed and the adjusted data smoothed slightly 
^ reduce minor fluctuations. Production and Trade Index is from Federal Reserve Bank of New York > 
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the general cycle. However, since all industries are so interdependent, a 
revival or recession in a key industry or group of industries soon transmits 
its effect to the other branches of activity. 

It appears that cyclical movements of general activity could be gener- 
ated by a concurrence of the same cyclical phase in the activity of several 
important industries; cr they might be generated by interferences from 
outside the business world. These interferences might be occasional events 
of considerable magnitude, such as a war, a discovery, unusual weather, 
or some political event; or they might be the simultaneous occurrence of 
several minor events, each reinforcing the effect of the other. 

The rough regularity of cycles may possibly be explained by the perio- 
dicity of certain of the extraneous events which, some authorities believe, 
are in part responsible. Cycles in weather have been suggested. It is 
more Kkely, however, that what regularity can be observed is due to the 
fairly constant length of time it takes the business world to respond to 
stimuli. For instance, the time it takes for erecting a building or for 
foreclosing a mortgage, or even to decide to go into bankruptcy, is not 
utterly irregular. Perhaps greater regularity would be observable were 
it not for the irregularity of accidental occurrences. 

There are some who reject the concept of self-generating cycles, believ- 
ing that cycles are brought about largely by external influences. Even 
these, however, are interested in observing whether production and con- 
sumption are increasing or decreasing, and especially in taking practical 
measures for stabihzation. Whether self-generated or entirely caused by 
non-business occurrences, it can be seen, from Chart 140, that pig iron 
production has experienced recurring depression every three or four years 
since 1919. Furthermore, the variations are very similar to those of the 
total volume of trade. It must, of course, be recognized that pig iron 
production is one of a large number of series represented in the total volume 
of trade. The greater amplitude of fluctuation is partly a characteristic 
of pig iron, a producer's good, and is partly due to the fact that an index 
composed of several series whose turning points occur at slightly different 
periods of time always cancels out some of the amplitude of the constituent 
series by the averaging process. Although the average length of time from 
depression to depression of pig iron as shown by this chart is about 45 
months, there is considerable variability in the duration t>f the different 
cycles. Also, it might be noted that there is considerable difficulty m 
deciding just what is a cycle. Were, for instance, the slight recessions in 
1925 and 1934 cycles or large irregular occurrences? If cycles, the average 
length of cycle was much less than 45 months. 

Periodic movements. As distinguished from cyclical movements, which 
have only rough regularity, many time series have variations which repeat 
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themselves with remarkable similarity at regular intervals. Chart 141 
shows the variation in the number of automobile injuries in New York 
City during different hours of the day. An example of a type of move-^ 
ment that repeats itself each week is the circulation of books in the reserve 
room of a university library. (See Chart 173 in Chapter XVII, dealing 
with periodic movements.) A still longer periodicity is the mtra-month 
variety. Thus bank debits have a tendency to reach a peak shortly after 
the first of each month. 

NUMBER OF 

ACCIDENTS 



12 4 8 12 4 8 12 

A.M. P,M. 

Chart 141. Average Hourly Number of Automobile Injtiries that Occurred in New 
York City During the Six-Month Period Prom January to June, 1937. (Traced from 
chart appearing m The New York Times, September 1, 1937.) 

The type of periodic movement which has engaged much of the atten- 
tion of economists, however, is that which has a period of one year, and is 
commonly known by the term, seasonal variation, (See Chart 142.) Cli- 
matic conditions, such as variations in rainfall, snow and ice, sunshine, 
humidity, heat^ and wind produce variations in demand which often reflecl 
themselves in variations in production, and also directly affect productioB 
in such occupations as agriculture and building construction. Social con- 
ventions also have their influence, the Christmas trade being a notable 
instance. To some extent holidays are not entirely independent of sea- 
sonal factors. Easter and Thanksgiving owe their origin at least in part 
to weather conditions. Also, it might be noted that man^s propensity for 
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ostentation leads him to change his style of car or coat each year at the 
proper season. These style changes greatly accentuate the seasonal varia- 
tions for which nature is primarily responsible 

For statistical analysis it may be desirable to calculate seasonal varia- 
tions on either a monthly or a weekly basis. On both of these bases, 
typical variations in steel mill capacity may be seen in Chart 142, 

It is worth observing that a seasonal pattern may change either gradu- 
ally or suddenly with the passage of years. Thus, a study of the seasonal 

PER. CENT 



Chart 142. Typical Monthly and Weekly Variations During a Year in Per Cent of 
Capacity of Steel Mill Activity. (Computed from Wall Street Journal data as supplied 
by Standard Statistics Co.) 

curve of Chart 143A indicates that while pig iron is now typically ar a 
low point during the winter months, this has not always been the case. 
On the other hand, Chart 187, page 507, indicates that the Christmas trade 
of department stores has become increasingly more important. In the 
automobile industry, on the other hand, there was a sudden shift in 1935 
during the time when new models were introduced, with a consequent effect 
on production schedules. This may be clearly seen from Chart 192, page 
517. Some series retain the same general pattern but change in intensity 
gradually or irregularly from year to year. This is particularly true of 
agricultural series. See, for instance, Chart 193, page 520, dealing with 
receipts of sheep and lambs at primary markets. Still other series may 
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retain a constant seasonal pattern, but exhibit peaks and troughs at dif- 
ferent months in different years, because of early or late seasons. 

Irregular variations. There are other variations which are not covered 
by the above classification. No theory seeks to explain these variations, 
and they may be considered as accidental from the point of view of the 
theorist. They may be minor fluctuations, perhaps of a random nature, 
too small to be worth considering individually, or they may be important 
episodic events, such as wars, earthquakes, or general strikes. As sug- 
gested above, these episodes may be so important as to generate, or assist 
in generating, a cyclical fluctuation, and occasionally it may be difficult to 
distinguish an episode from a cycle 

A Graphic Illustration 

Perhaps the nature of the movements will be more clearly understood if 
they can be seen graphically in one chart. The different movements foi 


tOCARITMMIC VERTtCAl. SCALE 



Chart 143A. Graphic Analysis of Variations in Pig Iron Production in the XTnited 
States, 1919-1937, (Por source of data from which these variations were estimated 
see Chart 139.) 
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pig iron are shown separately by the different curves of Chart 143A. The 
upper curve represents the secular trend and will be recognized as a frag- 
ment of Chart 139 A. It is not nearly so steep as it was in earlier years. 
The curve immediately below is an estimate of the cyclical movements of 
pig iron. The very large amplitude of these variations is immediately 
apparent. Of much smaller amplitude is the seasonal variation, shown 
as the third curve in this section. As can be seen, the seasonal pattern 
is not stable but is gradually changing. The irregular movements are 
indicated by the curve at the bottom. Their amplitude is, in general, 
rather large in comparison with that of the seasonal variation, although 
they seem quite modest during the middle portion of the period covered. 
It should not, of course, be assumed that the relative amplitude of the 
different movements is the same for other series as it is for pig iron pro- 
duction. Each series is characterized differently in this respect. 


logarithmic vertical scale 
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Chart 143B. Graphic Synthesis of Variations in Pig Iron Production in the United 
States, 1919-1937. (For wmce of data from winch these variations were estimated 
see Chart 139.) 
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Turning to Chart 143B, we find a progressive synthesis of the different 
elements beginning with the trend and ending with the original data. The 
chart is semidogarithmic; so that any curve in Chart 143B can be obtained 
by graphically adding the different curves of Chart 143A, beginning at the 
top and ending with the one to the left of the one being synthesized. (Or, 
any curve of Chart 143B can be obtained by a graphic summation of two 
curves, the one immediately above it and the one immediately to its left 
in Chart 143A.) It is apparent also that, if we should add for any month 
the logarithms of the values represented by every curve of Chart 143A, 
we would obtain the logarithm of the corresponding monthly value of the 
original data represented by the T X C X S X I line of Chart 143B. This 
is equivalent to saying that the original data are the product of trendy 
cycle, seasonal, and irregular movements,^ or 

Original data ==TXCXSXL 

TABLE 77 

Pig Iron Production as Synthesized prom its Estimated Component Elements, 

1936 


Month 

Secular trend 
in millions 
of long tons 

Cyclical 
movements 
a,s ratios 

Seasonal 
movements 
as ratios 

Irregular 
movements 
as ratios 

Actual 
production 
m millions 
of long tons 


T 

C 

S 

J 

T X 0 X S X I 

January . 

3,358 

.650 

940 

9869 

2,025 

February 

3,361 

.650 

950 

.8788 

1,824 

March . . . 

3,364 

.650 

1080 

.8638 

2,040 

April , . 

3,367 

650 

1080 

10171 

2,404 

May. 

3,370 

.667 

1 150 

10244 

2,648 

June, 

3,374 

.701 

1.050 

1.0413 

2,586 

July 

3,377 

.751 

1025 

.9979 

2,594 

August. . 

3,380 

.811 

1005 

.9841 

2,711 

September 

3,383 

3,386 

874 

.980 

.9421 

2,730 

October . ... 

929 

.970 

.9805 

2,992 

November . . 

3,389 

.970 

.900 

,9962 

2,947 

December. . 

3,392 

1000 

870 

1,0555 

3,115 


Source T C S, I coiroi.tcd fro".’ production Actual — d- -tio" * 1919 to date, Stand 

arn SU’tiST c*- C oi» and Securities, Basic 80, June 5, 1936, p. 

G-5, and Current Statistics, Vol 89, August 1938, p 23 


® It IS also possible to consider that 
Original data 

That is not, however, such a generally useful concept, since C, S, and I tend to remain 
about constant in magnitude relative to trend. This makes it possible to compute a 
seasonal mdex which remains uniform over a period of years; or to compare the per- 
centage fluctuations of cychcal variations. But there are senes which give better re- 
sults when seasonal is considered as constant in absolute, rather than relative, magni* 
tude See pages 525-527 for further discussion of this point. 
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Charts 143A and 143B illustrate the various individual movements of 
which pig iron production is composed, as well as a number of combina- 
tions of movements. Sometimes we wish to study the trend alone, some- 
times the seasonal, frequently the combination of trend and cycle, and 
possibly most often of all the cyclical movements. Inspection of Chart 
143B and of Table 77, in which the movements of pig iron production are 
synthesized from the elements of the series, should also explain the logic 
of the usual method of analyzing time series to obtain cyclical movements 
as a final product. When analyzing, we first estimate by statistical 


NUMBER 



Chart 144, Irregular Variations in Pig Iron Production Classified by Magnitude of 
Deviation. (This frequency distribution was obtained by grouping the irregular devia- 
tions of Chart 143 A.) 

methods: (1) trend, and (2) seasonal variation; then the data are divided 
through successively by the trend values and the seasonal values (or vice 
versa); finally, irregular variations are smoothed out, leaving only the 
cycles. These processes will be explained in subsequent chapters. 

In Chart 144 the irregular movements are shown as a frequency curve. 
The distribution plainly is leptokurtic. If the irregular movements had 
been of a random character, we would have expected a normal distribution, 
but in addition to minor fluctuations we have others that are episodic in 
character and whose effects are cumulative over several months* Note 
that in Chart 143A the irregular movements sometimes remaia on one 
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side of the 100 per cent line for more than six months in succession. Ac- 
tually, the interdependence of successive observations in a time series de- 
stroys the random character of any of its movements, regardless of the 
character of the factors causing such mo^^ements. 

The different movements which we have discussed — trend, cycles, sea- 
sonal movements, and irregular fluctuations— are of varying importance 
in different series. At the present time the outstanding feature of such a 
series as rayon consumption is its steep secular trend. Durable goods, 
such as pig iron, fluctuate tremendously with the course of business cycles; 
still others, such as department store sales, do not show a steep trend or 
pronounced cycles, but exhibit intense seasonal fluctuations. Individual 
series, particularly those with a narrow or specialized market, often are 
quite irregular in their variations, but broad group indexes present smoother 
curves when seasonal movements have been removed. 

Another View of Time Series Movements 

It is probably an over-simplification to say that there are only four types of 
movements discernible in time series. Some analysts would say that time senes 
are composed of wave-like movements of various lengths superimposed upon each 
other (includmg, of course, those already discussed). Some of these movements 
have been the subject of investigation by economists. Thus Kondratieff^ has 
discovered ‘dong cycles'^ lasting roughly bO years, running through many series 
and in a number of countries. He lists these waves as follows: 


Wave 

Low 

High 

I 

1780-1790 

1810-1817 

II 

1844-1851 

1870-1875 

III 

1890-1896 

1914-1920 


He finds that prices fall and agriculture suffers during the decline, and also that 
scientific discoveries are made during this period. At the begirmmg of the up- 
swing, colonies are acquired and new sources of gold found; during the upswing, 
scientific discoveries are applied, and there are wars and revolutions. These swings 
ai'e held to be cyclical in character since the factors with which they are associated 
are, in part, at least, the result of the preceding phase of the long cycle. Thus 
falling prices, which are associated with the downswing, lead to a search for new 
technical methods to lower the cost of production, and for the increasingly valuable 
commodity, gold. 

Kuznets has extensively studied another type of fluctuation, intermediate in 
length between Kondratieff^s long cycles and the ordinary business cycle. These 
waves he calls ^^secondary trends.^'^ The heavy solid hne of Chart 145 show^ 


^ See “The Long Waves in Ec'tnomic Life,’’ by N D. Kondratieff, The Review of 
Economic Statistics, November 1935, pp. 105-115 (translated by W. F. Stolper) 

® See Simon S. Kuznets, Secular Movements in Production and Prices^ Houghton 
Mifflin Company, Boston, 1930. 
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MILLIONS 
OF TONS 



Chart 145. Pig Iron Production, Primary Trend, and Primary X Secondary Trend 
by Years, 1860-1937. (For source of data see Chart 139 ) 



1S60 1670 I 860 1890 1900 1910 1920 1930 1940 

Chart 146. Secondary Trends in Production and Prices of Pig Iron in the United 
States, 1860-1937. (Secondary Trends are bmomially weighted li5-year moving aver-^ 
ages; 1860 and 1931-1937 trend values are estimates. For source of production date 
see Chart 139. Pig iron prices, 1860-1913, are from Simon Kuznets, Secular Movements 
in Production and Prices, Houghton Mi^n Co., Boston, 1930, pp. 364-365; for the 
years 1862-1868 the prices have been reduced to a gold basis. For 1914-1937 basic 
prices have been spliced to Kuznets^ No. 1 foundry prices. These are from The Iron 
Age as quoted by Standard Statistics Co., Standard Trade and Securities, Basic Statistics^ 
VoL 80. June 6^ 1936, p. G-6, and Current Statistics, VoL 90, October 14, 1938, p. 23.) 
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such fluctuations around the primary trend line, while in Chart 146 they are 
shown as percentages of primary trend. It may be that these secondary trendr 
are a disequilibrium m production resulting mainly from a disequilibrium in prices 
It IS commonly believed that rising prices stimulate business. This is largely 
because many of the businessman's expenses lag in changes behind his selling price. 
If his selhng price merely kept pace with wages and other costs, his profits would 
not increase, but when prices of a particular commodity are above noimal, then 
it is profitable to expand. The initial stimulant of abnormal prices tends to be 
stretched out over more than one cycle. Price movements themselves may be 
initiated by such factors as variations in the volume of gold production or changes 
in banking technique, which phenomena are often gradual developments that are 
in progress for a considerable period of time. The plausibility of the theory is 
indicated by comparing the secondary trends in production and prices of pig iron 
shown in Chart 146. To make secondary trends fall into the self-generating 
category, it is necessarj^ to show that the factors generating the price rise are at- 
tributable to the preceding downswing of prices. 

Long cycles, secondary trends, and business cycles are each a combination of 
outside influences and business responses; but in the first of the three it may be 
that the former predominates, while in the third the seK-generative aspect may 
be most predominant. These three movements merge into one another and in 
actual practice it is difficult to separate them. Should, for instance, the depression 
following 1929 be regarded as a very severe depression, or was it the coincidence 
of a business depression with the low point of a secondary trend? Possibly, future 
years will establish that a new primary trend is appropriate for the years following, 
say, 1931. Furthermore, it is difficult to say when a short wave-like movement 
is a cycle and not an accidental variation. This cannot be decided by the criterion 
of cause, for many cycles are partly caused by forces external to business. More 
reasonable is to consider as cycles only those movements which extend over a 
wide area of our business life. But the question immediately rises: How wide? 
Was, for mstance, the wave that rose in the spring of 1933 and reached a trough 
in the fall of 1934 a cycle? The answer is, of course, subjective, and seems to boil 
down to saying that any wave may be considered a cycle if we wish to study its 
cyclical characteristics. As a guide to economists and statisticians, Willard L. 
Thorp has studied the economic history of various countries, and on the basis of 
all evidence, statistical and otherwise, has subdivided the period since 1790 into 
sub-periods according to the difierent phases of the business cycle. His findings 
are published in Businm Annals.^ More difficult of identification than the cycles 
of general business are those pertaining to a particular industry. Attention has 
already been called to the difficulty of identifying cycles in pig iron production. 

Prelimmary Treatment of Data 

Some variations in time series may be due to the terms in which they 
are stated. Before attempting to isolate for purposes of study the move- 

® Published in 1926 by the National Bureau of Business Research, New York. This 
study is kept up to date by the National Bureau, with results published in occasional 
bulletins. 
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ments which have been described, it may be well to restate the data in 
more significant terms. 

Calendar variation. Usually, though not always, there are 365 days in 
a year. Although there are 12 months in each year, they vary in length 
from 28 to 31 days. To make matters more complicated, the different 
months do not start on the same day of the week, nor does the same month 
in successive years so start. Another difficulty is the matter of holidays 
Not only do the number of Saturdays and Sundays vary as between 
months, but February, with 28 or 29 days, has Washington's birthday 
and Lincoln's birthday, while March, with 31 days, usually includes no 
holidays. Even more confusing is the way Easter fluctuates between 
March and April. Thus do the different months vary extremely as to the 
number of working days. 

Although it seems impossible to divide the year into quarters con- 
taining the same number of whole weeks, nevertheless some business firms 
have tried to minimize the difficulty. A few firms keep records by 4-week 
periods. There are 13 such periods in a year, but quarterly data cannot 
be kept by this system. A few others keep records by quarters, each 
quarter being composed of three months — the first two months of four 
weeks each and the third of five weeks. Of course, neither of these plans 
is satisfactory so long as the first of a given calendar month may occur in 
either of two artificial months. And under any plan the unsystematic 
occurrence of holidays results in a different number of working days in 
successive artificial months. Movements have been launched to change 
the calendar to remedy these defects. One plan suggests identical quar- 
ters; each quarter would contain, not identical months, but three monthly 
patterns of thirty or thirty-one days each, these three patterns being re- 
peated so as to occur four times a year. An extra day, however, known 
as Year Day, would occur at the middle of the year. 

But until people can be persuaded to change their established customs 
sufficiently to change their calendar, the statistician is confronted with the 
problem of adjusting for calendar variation. The method is very simple. 
Using electric power production in 1930 as an illustration, and assumiag 
that adjustment for the number of calendar days is sufficient, we may 
divide the values of each month by the number of days in that month, 
thus expressing the data as millions of kilowatt hours per calendar day. 
This procedure is followed in Table 78, the results being recorded in col- 
umn 4. The data so adjusted will be used in the chapter on periodic 
movements. If it is desired to retain the figures in their original magni- 
tude, the figures as shown in colunrn 4 must be multiplied by 365 12 = 

30.4167, the average number of days in a month. Or, the same result 
can be obtained by dividing the original data by the ratio of the actual 
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to the average nrnnber of days in each month. These ratios are shown 
in column 5 of Table 78, and the results in column 6. These data are now 
spoken of as production “adjusted” for the number of calendar days, or 
for calendar-day variation. That the data so adjusted are less irregular 
is easily seen from Chart 147. 


TA.BLE 78 

AnJusTMnNT or Electric Power Pboouction fob Number of Calendar Days in 

Each Month, 1930 

(Millions of kilowatt hours) 


Month 

(1) 

Actual 

production 

(2) 

Calendar 

days 

(3) 

Production per 
calendar day 
[Col 2 - Col 3] 

(4) 

Ratio of 
actual to 
average 
calendar clays’^ 
(5) 

Prcdi’-t -n 
rc u 'r 
calendar days 
[Col 2 - Col 5] 
(6) 

January 

8,663 

31 

279 5 

1 01918 

8,500 

February 

7,627 

28 

272 4 

92055 

8,285 

March , 

8,187 

31 

2641 

1 01918 

8,033 

April . , 

8,019 

30 

267.3 

98630 

8,130 

May . , 

8,064 

31 

2601 

1 01918 

7,912 

June 

7,784 

30 

259 5 

.98630 

7,892 

July. 

7,899 

31 

254 8 

101918 

7,750 

August 

7,906 

31 

255 0 

1 01918 

7,757 

September 

7,792 

30 

259 7 

.98630 

7,000 

October 

8,195 

31 

264 4 1 

1.01918 

8,041 

November 

7,693 

30 

256.4 

98630 

7,800 

December . 

8,108 

31 

2615 i 

1.01918 

7,955 


r-1 ^3' r^mded by 30 4167 =* 3C5 - 12 

^ci^cv L .ifc States Department of Commerce, Survey of Current Business, 1936 Supplement, p 85 


If^ however, it is desired to adjust for the number of working days, a 
little more labor is required. The procedure is as follows: 

(1) Ascertain the schedule of holidays appropriate to the industry. 
This, of course, varies with industries and locahties.'^ 

(2) Count the number of Sundays in each month of each year. If 
Saturday is not a working day, the number of Saturdays must be counted 
also. If a half holiday, Saturday is given a weight of one-half. 

(3) Count the number of holidays in each month of each year. Per- 
haps some holidays will be given half weight. 

(4) Add the number of holidays and the number of Sundays (and per- 
haps Saturdays) for each month. For many industries an extra holiday 
must be sodded if a regular holiday occurs on Sunday. 

(5) Obtain the number of working days for each month by subtracting 


For a schedule of holidays by states, see‘^ Legal Holidays in the United States, 1936,’ 
Monihly Labor Revzew Yol. 34, No, 5, Norembex' 1936, pp. 1193-1196. 
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the number of holidays, Sundays (and perhaps Saturdays) from the number 
of calendar days, 

(6) Divide the original data by the number of working days each 
month; or adjust for working days by dividing by the ratio of the actual 
to the average number of working days. 

The laborious part of this procedure consists, in large degree, m the 
necessity of computing the number of working days. To facilitate dis- 
covering the number of Saturdays and Sundays in different months, as 
well as the months in which principal holidays occur on a Saturday or a 
'Sunday, a flexible calendar has been included as Appendix K. 

MILLIONS OF 
KILOWATT HOURS 



Chart 147. Electric Power Production, by Months, 1930, Before and After Adjust- 
ment for Varying Number of Calendar Days in a Month. (Data of Table 78.) 

Not all time series require adjustment for calendar variation. Clearly 
it would be spurious to do so for salary expenses of most corporations, 
since salaries of executives are usually constant from month to month. 
But for data requiring such adjustment it is frequently a difScult statis- 
tical problem to decide whether to adjust for working days, or merely foi 
calendar days. For some commodities it can logically be maintained that 
holidays within a month, far from decreasing consumer purchases during 
that month, may actually increase them. If the holiday occurs on the 
last day of the month and the stores are closed, however, it might decrease 
sales. In organizations which receive orders through the mail from a con- 
siderable distance, sales may be decreased as well by holidays occurring 
during the last few days of the preceding month. Just what is the logical 
adjustment to make is often very difficult to determine and requires fa- 



382 


THE PROBLEM OP TIME SERIES 


[Chap. 14 


miliarity with the business or industry in question. In case of doubt it 
is always possible to determine experimentally what method gives the 
smoothest results after the adjustment is made. Such a test provides no 
conclusive evidence but is only presumptive. Frequently, a separate ad- 
justment should be made for Easter, as explained on pages 509-515. 

Population changes. Since one element in primary trend is population 
change, it may be worth while to adopt the old military axiom of divide 
and conquer by expressing the data on a per capita basis. Or, the data 
may be adjusted for population changes by dividing by population figures 
relative to some base. This is done for Barron^s Index of Production and 
Trade in Table 91, page 415. The mechanical process is to divide the 
original data by the population figures. Naturally the remaining trend has 
quite a different character. Frequently it is simpler than before. 

TABLE 79 

Factory Average Hourly Earnings (25 Industries) and Cost otp Living, 1929-1937 


Year 

(1) 

Hourly wage 
(cents) 

(2) 

Cost of living 
(1929 - 100) 

(3) 

Hourly real wage 
(cents) 

[Col 2 -r Col. 3] 
(4) 

1929 

59.0 

100.0 

59.0 

1930 

58 9 

96 6 

61.0 

1931 

56 4 

87 1 

64 8 

1932 

49 8 

77.8 

64 0 

1933 

49.1 

74 8 

65 6 

1934 

58.1 

79.3 

72 3 

1935 

60 0 

82.5 

72 7 

1936 

617 

84 6 

72.9 

1937 

69 3 

88 4 

78 4 


Source* Survey of Current Business, 1936 Supplement, pp 11, 41, March 1937, pp 23, 31, March 1938, 
pp 63, 71 Cost of living index base has been shifted from 1923 to 1929 for this illustration 


Price changes. Since economists are usually interested in physical vol- 
ume changes rather than value changes, it is often necessary to convert 
the figures into this form. The process is generally called deflating. It 
consists in dividing the value figures, period by period, by the appropriate 
price index figures, as illustrated by Table 79, which shows figures on 
hourly wages and cost of living. The logic of this procedure is simple. 
Since price times quantity equals value, quantity must equal value divided 
by price. The hourly real wage column may also be referred to as 
hourly wages in terms of 1929, The table shows that hourly wages of 
employed persons did not fall as rapidly as cost of living during the de- 
pression. Hourly real wages, therefore, have increased, as can be seen 
from Chart 148. This, of course, does not imply that the employed wage 
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earner's position improved during the depression, since the number of 
hours worked per week was frequently shortened. It does indicate, how- 
ever, that one hour's labor enables the worker to command more com- 
modities and services than formerly. It is to be noticed that a cost of 
living index (rather than an index of the general price level, or a wholesale 
commodity price index) is used as a deflator. Unless a deflator is used 
that pertains to the data being deflated, a satisfactory measure of phys- 
ical volume cannot be obtained. 

Securing comparability. Statisticians for trade associations experience 
considerable difldculty in obtaining prompt reports from all members. For 

PER CENT 
(AO 

130 

120 

t 10 

too 

90 

eo 

70 

1929 1930 1931 1932 1933 *934 *935 * 936 1937 

Chart 148. Nominal Hourly Wages, Cost of Living, and Real Hourly Wages, 1929- 
/937. (Data are from Table 79, but wage data have been expressed as percentages 
of 1929 to facilitate comparison.) 

instance, 93 firms might report on time one month and 96 the next, the 
latter not necessarily, however, including all the 93 firms. To be strictly 
accurate, a new time series should be constructed each month for the entire 
'period including all of, and only, those firms which reported promptly for the 
month in question. Thus, a complete time series one month would be com- 
puted for the 93 firms, and the next month for 96. This is a very laborious 
procedure. An easier procedure is to make a preliminary estimate by com- 
puting the percentage of the preceding period for only those firms which re- 
ported promptly in the two consecutive months, and to multiply the figure 
for the preceding month (which now includes all firms) by this percentage. 
A revised figure can be computed when all the reports have been obtained. 
If an industry is expanding and new firms are appearing, it is, of course, 
desirable to include them. Increased employment and production may 
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result from increased activity of existing firms or the appearance of new 
ones. Sinularly, firms may cease to exist and must be dropped from a 
reporting list. 

Another source of incomparability may be that the unit of reporting 
has changed. If it is merely a question of changing from a pound basis 
to a ton basis, this is a simple matter. Where the product has changed 
in kind, however, it is difficult to find a satisfactory solution. How, for 
instance, can we compare the physical production of radios between 1925 
and 1935? Not only was there a difference in the proportion of radios 
of different grades sold in the two years, but radios that were the same 
with respect to price, weight, number of tubes, or any other readily meas- 
urable characteristic, were still vastly different in their capacity to render 
utility to the consumer. 
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CHAPTER XV 

ANALYSIS OF TIME SERIES 

SECULAR TREND 


Objects and Method 

There are two important reasons for attempting to describe the trend 
of a series by some kind of curve. First, it may be desired to measure 
the deviations from trend. These deviations consist of cyclical, seasonal, 
and accidental movements. Frequently the obtaining of these deviations 
is but one step in attempting to isolate cycles, in order to study them. 
Second, it may be desired to study the trend itself, in order to note the 
effect of factors bearing on the trend, to compare one trend with another, 
to discover what effect trend movements have on cyclical fluctuations, or 
to forecast future trend movements. 

The purpose for which measurements are made partly determines the 
methods adopted. If the object is solely to isolate cycles, it seems reason- 
able to suppose that the trend line chosen should pass through the cycles 
in such a way as approximately to allow a balancing between the positive 
and negative phases of each cycle. Whether a curve is deemed to have 
accomplished this object depends, of course, upon our conception of what 
constitutes a cycle in each case. If, on the other hand, the object is to 
make comparisons, generalizations, or forecasts, the curve should be not 
only logical, but also of such a nature that it can readily be expressed by 
a mathematical formula. By so doing, a person can, for instance, say 
that at a given time a series shows a certain rate or a certain amount of 
growth per annum, and that, ff this tendency continues, the trend will 
reach a certain value at some specified time in the future. Fitting a trend 
by a mathematical formula does not, however, remove the subjective ele- 
ment from trend fitting. The statistician can vary somewhat the shape 
of the curve by selection of the type of formula he employs, or the years^ 
to which he fits the curve. It remains true, therefore, that the statistician 
decides in advance, upon as objective and logical a basis as possible, what 
he thinks the trend ought to look Hke, and then selects the mathematical 
method that will closely approximate this result 
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Trend Fitted by Inspection 

The simplest method of describing a trend graphically is merely to draw 
it freehand, or perhaps to make use of a transparent ruler or a French 
curve. It is well to plot the data on semi-logarithmic paper also, for the 
trend may tend to straighten out if so plotted. The trend will be a straight 
line on this type of paper if the series is increasing oi decreasing at a 
constant rate. 

An attempt was made by one of the writers to fit, by inspection, a trend 
to the annual rayon consumption data: the results are shown in Chart 149. 


THOUSANDS 
OF POUNDS 



Chart 149. Trend Line Fitted by Inspection to Total Domestic Consumption of Rayon 
in the United States, 1919-1937. (Data of Table 80.) 

This highly subjective method is open to the objection that may be made 
to all subjective methods — ^the statistician determines what answer he 
wants and then proceeds to obtain it. But, as has been said, he can ac- 
complish very nearly the same result merely by careful selection from 
among numerous available mathematical methods. The most valid ob- 
jection is that subjective methods require the exercise of a high level of 
judgment and the statistician therefore cannot safely have such work done 
by hired clerks. 


Moving Averages 

Simple moving averages and annual trend values. A simple and flexible 
mathematical method of trend fitting is the moving average. The process 
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of computing a 3-year moving average is shown m Table 80, United States 
rayon consumption figures being used for illustrative purposes. In col- 
umn 4, the average 12,587 is - . 17,739 is the aver- 

o 

age of 8,718, 19,751, and 24,747; and so on. It should be noted that the 
moving average figures are placed opposite the center of the 3-year periods 
to which they refer, for the same reason that figures referring to a whole 

TABLE 80 

Computation op 3- Year Moving Average of United Statiiis 
Consumption of Rayon, 1910-1937 


(Thousa,nds of pounds) 


Year 

a) 

Consumption 

(2) 

3-year 

moving total 

(3) 

3-year 

moving average 
(4) 

1919 

9,291 



1920 

8,718 

37,760 

12,587 

1921 

19,751 

53,216 

17,739 

1922 

24,747 

77,056 

25,685 

1923 

32,558 

99,548 

33,183 

1924 

42,243 

133,078 

44,359 

1925 

58,277 

161,150 

53,717 

1926 

60,630 

218,955 

72,985 

1927 

100,048 

260,779 

86,926 

1928 

100,101 

331,597 

110,532 

1929 

131,448 1 

349,517 

116,506 

1930 

117,968 

406,776 

135,592 

1931 

157,360 

427,369 

142,456 

1932 

152,041 

521,284 

173,761 

1933 

211,883 

558,695 

186,232 

1934 

194,771 

659,330 

219,777 

1935 

252,676 

745,041 

248,347 

1936 

297,594 

811,465 

270,488 

1937 

261,195 




Source; Textile Economic Bureau, Rayon Organon, Vol IX, No 2 January 21, 
1938, P 16 


period are customarily plotted on a chart in the middle of the appropriate 
spaces. 

Chart 150 shows three different moving averages fitted to these data. 
All are bad fits. The 3-year movmg average traces an inverse cycle. Be- 
cause cycles in the rayon industry appear to last about two years, a 3-year 
moving average always includes either two depression years and one year 
of prosperity, or two good years and one bad year. On the other hand, 
the 5-year moving average, since it always includes either two cyclical 
peaks and three troughs, or three peaks and two troughs, dips down into 
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the troughs and reaches up into the peaks. The 7-year moving average 
follows the same general pattern as the 3-year, but is smoother. 

From the reasoning of the preceding paragraph it must be obvious that, 
unless the moving average is the same length as the movement being 
smoothed (or some integral multiple thereof), the moving average will 
vary either directly or inversely with the undulations of that movement. 
If, however, it is an integral multiple of the wave length of the series say^ 
once, twice, or three times the average number of years or months in such 
movements — any particular moving average value will contain the same 
number of peaks as troughs, and the moving average will tend to smooth 



Chart 150. 3-Year, 5-Year, and 7-Year Moving Average Trends Fitted to Raycu 
Consumption, 1019-1937. (For purposes of comparison the different curves have been 
plotted close together on the same chart instead of on separate charts. Each curve is 
plotted to the same vertical scale, but at a different level. This arrangement is some 
times referred to as a multiple axis chart. For original data and S-year moving averagt 
see Table 80.) 
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out the fluctuations which are sought to be eliminated. This principle, 
that the 'period of the moving average should conform to the duration of the 
movement being smoothed, is probably the most important one to observe 
in the use of moving averages. 

Since the duration of rayon cycles appears to be about two years, it 
would therefore be better to try a moving average of two or four years. 
An extra step, however, is involved in this process, illustrated by Table 81. 
In colunrn 3 of this table, 18,009, which is the total of the two years 1919 
and 1920, is placed between these two years; and in like manner is 28,469 
placed between 1920 and 1921. A 2~year moving average is then taken 
of the 2-year moving total in order to center the figures opposite, rather 
than between, yeais. The computation of a centered 4-year moving aver- 
age is shown by Table 82. The procedure is similar to that in Table 81. 
except that two short cuts are used. First, as a matter of convenience, 
column 3 figures are placed opposite second years instead of between 
second and third years. Second, in order to eliminate one series of divi- 
sions, each consecutive pair of moving total figures is added together 
and the result is divided by 8 (or, to save work, multiplied by the reciprocal 
of 8, which is .125). This gives the moving average figures of column 5. 
It should be noticed that the first entry in this colmnn is opposite 1921, 
which is the third year. 

A glance at Chart 151 reveals that either the 2- or the 4-year moving 
average is superior to the 3- or 5-year moving average. The 2-year mov- 
ing average passes through the center of each cycle. Mathematically this 
must be true since the cycles are two years in length. It may appear to 
the reader that what we describe as a centered 2-year moving average is 
really a 3'-year moving average with the middle year given double weight. 
This is true, but it may also be regarded as an estimate of consumption 
during a 2-year period consisting of the middle year plus the half year 
preceding and the half year following. Although the 2-year moving aver- 
age goes through the center of each cycle, it has more bends in it than are 
ordinarily considered appropriate for a trend. The 4-year moving aver- 
age, though not passing so closely through the centers of cycles, is smoother, 
and on the whole perhaps better. On the other hand, the 6-year moving 
average is undesirable since it seems to lie above a reasonable trend for 
the years X922, 1923, and 1924. It is inevitably true that a moving aver- 
age, unless the items are elaborately weightea, will smooth out not only 
the undesirable irregularities, but also part of the curve which it is sought 
to approximate. Thus the moving average will fail below a trend which 
is concave downward and above one which is concave upward (as shown 
in the chart shown on page 393). Furthermore, the 6-year moving aver- 
age eliminates (or necessitates estimates for) the first three and the last 
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three years, leaving only thirteen of the nineteen years with trend values! 
Of course, the curve can be extended freehand in each direction, a highly 

TABLE 81 


Computation of Centbeed 2-Year Moving Average op United States Rayon 

Consumption, 1919-1937 

(Thousands of pounds) 




1 

2-year 

2-year 

2-year 

moving total 

Centered 

2-year 

moving 

average 

Year 

Consumption 

movmg 

moving 

of 2-year 



total 

average 

moving 

average 

(1) 

(2) 

( 3 ) 

(4) 

( 5 ) 

(6) 

1919 

9,291 






8,718 

28,469 

23,239.0 

11,620 

14,234.5 

1921 

19,751 


36,483.5 

18,242 


44,498 

22,249.0 1 


1922 

24,747 

57,305 

28,652.5 


25,451 

1923 

32,558 



33,026 


74,801 


1924 

42,243 



87,660.5 

43,830 

1925 

58,277 

118,907 

59,453.5 

109,713 5 

54,857 

1926 


160,678 

80,339.0 

139,792 5 

69,896 

1927 



100,074.5 

180,413.6 

1 

90,207 

1928 



215,849.0 

107,924 



231,549 

115,774.5 

1929 

131,448 

249,416 

124,708 0 

240,482 5 

120,241 

1930 

117,968 

275,328 

137,664 0 

262,372.0 

131,186 

1931 



154,700 5 

292,364.5 

146,182 

1932 

152,041 

363,924 

181,962 0 

I 

336,662 5 

168,331 

1933 

211,883 


385,289.0 

, 192,644 



406,654 

203,327.0 

1934 

194,771 

447,447 

[ 

1 223,723.5 

427,060 5 

213,525 

1935 

252,676 1 

: 498,858.5 

249,429 


I 


i 275,135.0 


1936 

297,594 

558,789 

[ 

279,394.5 

554,529.5 

277,265 

1937 

261,195 


... 

i 

1 


Source: See Table 80. 
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subjective procedure, or a remedy may be adopted that is nearly as bad 
as the disease itself. Estimates can be made of values for the original 

TABLE 82 

Computation of Centered 4- Year Moving Average of United States Rayoi^ 

Consumption, 1919-1937 

(Thousands of poimds) 


Year 

(1) 

Consumption 

(2) 

4-year 

moving 

total 

(3) 

2-year 

moving total 
of 4-year 
moving total 
(4) 

Centered 

4-year 

moving average 
[Col. 4-^-8] 

(5) 

1919 

9,291 




1920 1 

8,718 

62,507 



1921 

19,751 

85,774 

148,281 

18,535 

1922 

24,747 

119,299 

205,073 

25,634 

1923’ 

32,558 

157,825 

277,124 

34,640 

1924 

42,243 

193,708 

351,533 

43,942 

1925 

58,277 

261,198 

454,906 

56,863 

1926 

60,630 

319,056 

580,254 

72,532 

1927 

100,048 

392,227 

711,283 

88,910 

1928 

100,101 

1 449,565 

841,792 

105,224 

1929 

131,448 

1 506,877 

956,442 

119,555 

1930 

117,968 

558,817 

' 1,065,694 

133,212 

1931 

157,360 

639,252 

1,198,069 

149,759 

1932 

152,041 

716,055 

1,355,307 

169,413 

1933 

211,883 

811,371 

1,527,426 

190,928 

1934 

194,771 

956,924 

1,768,295 

221,037 

1 

1935 

252,676 

1,006,236 

1,963,160 

245,395 

1936 

297,594 


... 


1937 

261,195 

... 


1 


Soxxrce. See Table 80 
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data for several years on either side of the actually known data. In this 
case estimates may be made for 1916, 1917, 1918, and 1938, 1939, 1940, 
The 6-year moving average will then be affected by these hypothetical 
data, and the trend will run from 1919 through 1936. There is grave 
doubt concerning the validity of this procedure, but some authorities hold 
that, although we do not know with certainty concermng these periods, it 
IS better to use what partial knowledge we possess than to use none 
whatever. 

Obtaining monthly trend values from annual moving averages. In the 
above illustrations, moving averages have been fitted to annual data. 



Chart 151. Centered 2-Year, 4-Year, and 6-Year Moving Average Trends Fitted to 
Rayon Consumption, 1P19-1937. (For purposes of comparison the different curves 
have been plotted close together on the same chart instead of on separate charts. Each 
curve is plotted to the same vertical scale, but at a different level. This arrangement 
is sometimes referred to as a multiple axis chart. For original data and 2-year and 
4-year moving averages see Tables 81 and 82.) 
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Monthly values can be obtained by interpolation between annual averages. 
For instance, m Table 83 the 1927 and 1928 trend values are 88,910 and 
105,224 respectively. The increase is 16,314 
during the intervening year, or 1,359.50 per 
monrh. Technically, 88,910 should be centered 
between June and July 1927, the July value 
being 88,910 + (1,359.50 2) = 89,589.75. 

The monthly trend values are then obtained by 
successive additions or subtractions of 1,359.50 
(see Table 83). 

If it is desired to fit a moving average trend 
to monthly data that have not been adjusted 
for seasonal variation, we should take some in- 
tegral multiple of twelve months (approximat- 
ing the average duration of the cycle) ; other- 
wise remnants of seasonal movements, either 
positive or inverse, will appear in the trend. For the rayon data, 24 or 48 
months would be appropriate. Such a method gives fairly satisfactory 

TABLE 83 

Interpolating Annual Moving Averages to Obtain 
Montbxy Trend Values of United States Rayon 
Consumption, July 1927 through June 1928 

(Thousands of pounds) 


Year and month 

Annual trend 
values 

Monthly trend 
values 

1927- 



June 

88,910 

• 

July . . . 

89,589.76 

August . 


90,949 25 

September 


92,308 75 

October . 


93,668.25 

November . . . 


95,027 75 

December . 


96,387 25 

1928. 



January . 

. . . 

97,746 75 

5’ebruary 


99,106 25 

March 


100,465 75 

April 


101,825 25 

May 


103,184.75 

June ... 

105,224 

104,54425 

July 

... 



Y 


n 



7 

- 


94 ^ 

✓ 

/- 


41 ^ 





- 


1934 '35 '36 '37 1933 


Three- Year Moving Aver- 
age (Broken Line) of a Series 
of Data Which Are Concave 
Upward. 


Source: Table S2. 
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results for these data, although, being an even number of periods, the 
moving average requires centering. If, however, the cycles averaged 3^ 
years in length, it would be necessary to use 7 X 12 = 84 months in the 
moving average. More difficult would be the problem if the average cycle 
length were some inconvenient figure such as 3f years, which would re- 
quire 276 months to fulfill strictly the requirements laid down above! 

Of course, if the data have previously been adjusted for seasonal move-- 
ments (see Chapter XVII), this difficulty disappears. Again taking the 
rayon data, turning points of which may be read from Chart 203, page 555, 
we have computed the average cycle length to be 24.45 months, as is 
shown by the following analysis. In order to avoid centering, a period of 
25 months would therefore seem appropriate. 


Computation of Average Cycle Length of Eaton 
Consumption 


Turning 'point 

Peak to peak 

Trough to trough 

Peak: April 1923 

(Months) 

(Months) 

Trough: February 1924 



Peak: May 1925 

26 


Trough: June 1926 


28 

Peak: May 1927 

24 


Trough: July 1928 . . 


25 

Peak: June 1929 

25 


Trough: October 1930 


27 

Peak: May 1931 

*23 


Trough: May 1932 . ... 


19 

Peak: June 1933 

Trough: September 1934 

25 

28 

Peak: July 1935 

25 


Trough: March 1936 


18 

Average length .... 

24 7 

24.2 


Moving averages': summary. From what has been said it should be 
apparent that the fitting of a trend by a moving average is a procedure 
requiring the exercise of considerable judgment. It may be well therefore 
to conclude by summarizing some of the characteristics of moving averages, 

(1) A moving average smoothes out fluctuations, provided its period 
is some integral multiple of the length of the movement to be smoothed. 
Accordingly, irregular movements are usually smoothed by a 3- or 5-month 
moving average, seasonals by a 12-month moving average, and cycles by 
a somewhat longer moving average. Since cycles usually vary in duration 
with the passage of time, this casts doubt on the appropriateness of a mov- 
ing average trend for certain series. 

(2) If the moving average is for an even number of periods, the result- 
ing values must be centered by a 2-period moving average. 
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(3) The larger the number of items used in the average, the smoother 
it becomes, for the less important, relatively, becomes any item which is 
added or dropped. 

(4) The larger the number of items, the greater the tendency to iron 
out, not only the fluctuations smaller in duration than the period of the 
average, but also part of the curvature of a non-linear trend itself. 

(5) The larger the number of items in the period, the greater the num- 
ber of trend values on each end which must remain unknown, or estimated 

N — 1 

Specifically, if the average embraces N periods, there will be — ^ — trend 

N 

values omitted at each end; but — if iV is even and the moving average is 
centered. 

(6) A moving average is a descriptive measure rather than a summary 
measure. Thus, to say that a trend is described by an eleven-year moving 
average throws no light on the way in which the series grows or declines. 
Also, a moving average states no “law’’ of change. Nevertheless, the mov- 
ing average is useful as a step preliminary to deciding on the final type of 
trend, and as a final step when the trend is not well defined or does not 
conform to any reasonably simple mathematical equation type. 

Straight Line Trend 

A mathematical equation not only is a descriptive measure of the trend, 
but also gives a concise definition of that trend. If the trend itself is to 
be studied, or is to be extended beyond the data, it is especially desirable 
that it be so determined that it can be described by a mathematical 
equation. 

Description. The simplest type of curve is the straight line, which is 
described by an equation of the type Yc = a + bX, in which X is the 
independent variable and Yc the trend value of the dependent variable.^ 
Since their values must be determined for each of the series being analyzed, 
a and b are referred to as unknowns. They are also called constants since, 
once their values are determined, they do not change. 

To take the simplest case, suppose that a ~ 0 and 6 = 1. The equa- 
tion then becomes: Fc = Z; and this means that with each increase oi' 
one unit of the independent variable, the dependent variable also increases 
one unit. This equation is plotted in the upper left-hand section of Chart 
152. Incidentally, it should be observed that all four quadrants are shown 

^ The symbol Y will be used to designate an observed value of the dependent variable; 
while Yc indicates a value that has been computed, usually from a mathematical 
equation. 
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in this chart. Before attempting to plot a curve, it is well to draw up a 
table of X and Yc values, as shown on the chart, in which are recorded 
the computed values of Y that correspond to selected values of X. As a 
matter of fact, only two points are needed to plot this or any straight line^ 
and most accurate results are obtained by using two X values a consider- 
able distance from each other. 

Other straight line equations and their curves are shown in the other 
sections of Chart 152, an inspection of which yields the following informa- 
tion; a is the value of Y when X is 0 (the Y value at the X origin), or, as 
it is frequently termed, the Y intercept; while Vindicates the steepness, or 
slope, of the line. When b is positive, the slope is upward; when b is 
negative, the slope is downward. 


TABLE 84 

Computation of Semi-Averages Trend for Electric Power 
Production, 1921-1930 

(Millions of kilowatt hours) 


Year 

(1) 

Average monthly 
production 

; 

Semi-averages 

(3) 

Trend values 

(4) 

1921 

3,415 


3,380 0 

1922 

3,971 


3,933.2 

1923 

4,639 

4,486.4'^ 

4,486.4 

1924 

4,918 

. 

5,039 6 

1925 

5,489 


5,592 8 

1926 

6,149 


6,146.0 

1927 

6,684 

. . 

6,699.2 

1928 

7,321 i 

7,252 4 ✓ 

7,252.4 

1929 

8,113 

. . . 

7,805.6 

1930 

7,995 


8,358 8 


Source* United States Department of Commerce, Survey of Current BunnesSt 
1Q36 Supplement, p 85 


Method of selected points. Since the location of only two points is 
necessary to obtain a straight line equation, it is obvious that we may 
select two representative points and connect them by a straight line. Of 
course, this method is highly subjective. Typically, the data are divided 
into halves, and an average is computed for each half. This is known as 
the method of semi-averages, and is illustrated in Table 84, dealing with 
electric power production. As indicated by column 3, the trend value for 
1923 is 4,486.4., and that for 1928 is 7,252.4. This is an increase during 
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the 5-year period of 7,252.4 — 4,486.4 = 2,766.0. The annual increment 
is of course one-fifth of that amount, or 553.2. It is therefore apparent 
that this trend can be described mathematically by the equation Yc = 
4,486.4 + 553. 2X, with origin at 1923. The trend values are most easily 
found by successively adding 553.2 to 4,486.4, except that, for 1922 and 
1921, subtractions are involved instead. As indicated by Table 84, the 
1921 trend value is 3,380.0. The equation could therefore be written 
Yc = 3,380.0 4* 553,2X, with origin at 1921. 


B'lLLIONS 

OF KILOWATT HOURS 
I I 


10 


0 


8 


7 


6 

5 


4 


3 
19 

Chart 153. Straight-Line Trends Fitted by Method of Semi-Averages to Average 
Monthly Production of Electric Power, 1921-1930 and 1931-1938. (Original data from 
U S. Department of the Interior, Geological Survey^ as quoted by XJ. S. Department 
of Commerce, Survey of Current Business: 1936 Supplement, p, 85; Vol. 17, February 
1937, p. 41; Vol. 18, March 1938, p. 81.) 

Tliis method is to be commended for its simplicity and is used to some 
extent in practical work, but nearly all statisticians prefer the more re- 
fined method described in the ensuing pages.^ 

The results of the semi-average procedure are shown in Chart 153. This 
chart shows two trends fitted by the method of semi-averages : one is fitted 
to 1921—1930 data and the other to 1931-1938 data. This sharp break 
in the trend may seem to the reader to be inconsistent with the idea of 
what a trend really is. Many economists believe that continuity and star 


^ The method referred to is the method of least squares: For the same data it gives 
the equation Ft? = 3,426 25 4 542 922X, with origin at 1921, which compares with 
Yc «= 3,380.G 4“ 553.2X by the method of semi-averages. 
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bility constitute the very essence of the trend. On the other hand, there 
are some who believe that the depression which reached its trough in 
1932-1933 was not a mere business cycle. It represented a breakdown 
in our economic order. The old trends did indeed continue, but from a 
lower level; or we can say that they were set back about five years. Chart 
153 illustrates this concept, although it does not necessarily commit the 
authors to this view. 

Method of least squares . A more refined method, which can be applied 
to mo^"'co^pS*T^pe?^ trends, is favored by most statisticians. This 
method is designed to accomplish two results. 


millions 

OF LINES 



iJhiart 154. Straight-Line Trends Fitted by Method of Least Squares to United 
States Magazine Advertising. (Data of Tables 88 and 89.) 

1. The sum of the vertical deviations from the straight line must equal zero. 
If we should connect by a vertical line each production figure (lOlfi- 
1930) in Chart 154 with the dashed trend, the vertical lines extending 
upward from the trend would exactly balance those extending downward. 
It would not suffice, however, merely to set the sum of the deviations 
equal to zero, since any straight line (other than vertical) passing through 
X, Y would f ulfill the requirement. This condition would even be satis- 
fied by a straight line sloping at right angles to the trend. 

2. The mm of the squares of all these deviations, both above and below the 
trend line, must he less than the mm of the squares from any other conceivahU 
straight line. It is because of this second characteristic of such a line 
that the method of fitting it to obtain this result is called the method of 
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least squares,^ In fitting a curve to meet the second requirement, the first 
requirement is automatically satisfied.^ 

In a sense, then, a trend line fitted by the method of least squares is 
analogous to the arithmetic mean, for the latter measure is a single value, 
rather than a series of values, summarizing a statistical seiies which pos- 
sesses the two characteristics mentioned above. 


TABLE 85 

United States Magazine Advertising 1918-1930 and Obser- 
vation Equations for Straight Line Trend 

(Magazine advertising in thousands of lines per month) 


Year 

Z 

Advertising 

Y 

Observation equation 

1 = a -f" 

1918 

0 

1,547 

1,547 = 

a 



1919 

1 

2,142 

2,142 = 

a 

+ 

h 

1920 

2 

2,803 

2,803 - 

a 

+ 

2b 

1921 

3 

1,856 

1,856 -= 

a 

d" 

Zh 

1922 

4 

2,030 

2,030 = 

a 

+ 

4b 

1923 

5 

2,520 

2,520 - 

a 

i- 

5b 

1924 

6 

2,620 

2,620 - 

a 

+ 

6b 

1925 

7 

2,623 

2,623 - 

a 

+ 

7b 

1926 

8 

2,958 

2,958 - 

a 

+ 

Sb 

1927 

9 

3,038 

3,038 = 

a 

T 

9b 

1928 

10 

3,032 

3,032 « 

a 

+ 

m 

1929 

11 

3,384 

3,384 = 

a 

-f 

lib 

1930 

12 

2,984 

2,984 « 

a 

+ 

I2h 


Source United States Department of Commerce, Sutv'’ 1 / of Current Business, 
October 1933, p 20, 1936 Supplement, p 24, December 193C, p 25, May 1937, 

p 26 


3 A distribution of chance errors follows the normal curve; and it can be demon- 
strated that the greatest probability of obtaining deviations from some computed value 
or series of values which are distributed in this fashion is obtained when the sum of 
the squared deviations is at a minimum (see Appendix B, section XV-2). If the de- 
viations are distributed m this fashion, then the fitted value is most probable. If it is 
believed that deviations from the appropriate norm are chance errors, it follows that 
the method of least squares is the appropriate method of fitting The method is also 
conve3aient algebraically, as the student can observe in connection with correlation 
analysis and analysis of variance Tune series fluctuations around a trend line are 
not, however, independent accidental occurrences, and it is to be doubted that there 
is any special reason for using the method of least squares in trend fitting, other than 
for its convemence. Certain of the trends explamed in this volume are, in fact, fitted 
by other methods Some statisticians even argue that the least squares criterion is not 
appropriate for time series trends, smee time series are sometimes characterized by 
extreme deviations not m accordance with the normal law The method of least 
squares, of course, is particularly influenced by extreme deviations because of the 
squaring process 

^ The mean of the Y a values is the same as the mean of the Y values. This is demon- 
strated m Appendix B, Section XXII — 1 Before reading that explanation, however 
the reader should peruse the remainder of this section. 
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Normal equations. In order to fit a straight line by the method of 
least squares, two normal equations must be obtained and solved simuL 
taneously, since there are two constants or unknowns to be found. The 
normal equations are obtained from observation equations. There are as 
many observation equations as there are observations. In this case we 
shall use United States magazine advertising 1918-1930 as an illustration; 
hence there are thirteen observations. Since the straight line trend is of 
the equation type Yc — a + bX, we can insert the observed values of X 
and of Y in the expression F = a + hX and obtain the thirteen observa- 
tion equations, as in Table 85. In this table 1918 has been taken as the 
X origin, although any other year could have been set down as zero. 

Each original observation equation is now multiplied by the coefficient 
of a. Since the coefl&cient of a is 1 for each equation, the resulting obser- 
vation equations, which are shown in colunxn 3 of Table 86, are unchanged. 

TABLE 86 

Derivation of Normal Equations from Observation Equations for United States 
Magazine Advertising Data 


Original 

observation equations 
y = a -h 

(1) 

1 

Coeffi- 
cient of a 

(2) i 

First set of 
observation equations 
7 = a 4 6A 
[Col. 1 X Col. 2] 

(3) 

Coeffi- 
cient of h 
X 

(4) 

Second set of 
observation equations 
XY =aX+ hX^ 
[Col. 1 X Col 4] 

(S) 

1,547 == a 

1 

1,547 = a 

0 


2,142 = a + & 

1 1 

2,142 = a + 6 

i 1 

2,142 = a 4 6 

2,803 = a 4- 26 

1 

2,803 = 0 + 26 

2 

5,606 = 2a 4 4:6 

1,856 = a + 36 

1 

1,856 = a 4 36 

3 

5,568 = 3a + 96 

2,030 a + 46 

1 1 

2,030 = 0 + 46 

• 4 

8,120 = 4o + 165 

2,520 = <x 4 56 

1 1 

2,520 = a + 5b 

5 

12,600 = 5a + 255 

2,620 = a 4 66 

1 1 

2,620 = 0 + 66 

6 

15,720 = 6a + 365 

2,623 == a 4 76 

1 

2,623 = 0 + 76 

7 

18,361 = 7a + 495 

2,958 = a 4 86 

1 

2,958 = 0 + 86 

8 

23,664 = 8a + 645 

3,038 - a 4 96 

! 1 

3,038 = 0 + 96 

9 

27,342 = 9a + 815 

3,032 = a 4 106 

1 1 

3,032 = 0 + 106 

10 

30,320 = lOo + 1006 

3,384 = a 4 116 

1 

3,384 = 0+ 116 

11 

37,224 = llo + 1216 

2,984 = a 4 126 

1 

2,984 = 0 + 126 

12 

35,808 = 12o + 1446 

Normal equation 


33,537 = 13a + 786 


222,475 = 78o + 6505 


Source. Table 85 


Next, each original observation equation is multiplied by the coefficient of 
6, with the results shown in column 5. We now have two new sets of 
observation equations. Each of these sets is summed to obtain the normal 
equations shown at the bottom of the table. The two normal equations 

I. 33,537 - 13a + 785, 

11. 222,475 78a + 6506, 
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are the numerical representation of the following equation types 

I. S7 = ATa + 6SX, 

IL SZF = aSX + bXX^ 

In order to solve these two equations simultaneously, we may multiply 
equation I by 6 (= 78 13) and subtract equation I from equation II, 

thus obtaining an equation with one unknown, h: 

I. 201,222 = 78a + 4685 
II. 222,475 = 78a + 6505 


21,253 = 1825 

5 = 116.7747 

Having obtained 5, we obtain a by substituting the value of 5 in equc 
tion L Thus 

I. 33,537 = 13a + 78 (116.7747) 
a = 1,879.121. 

It is desirable to check the accmacy of the solution of the normal equa- 
tions, either by obtaining a by substitution of 5 in normal equation II, or 
the values of a and of 5 may be substituted in the second normal equation 
as follows: 

II. 222,475 = 78 (1,879.121) + 650 (116.7747) 

= 222,474.99. 

The trend equation may now be written 

Yc - 1,879.121 + 116.774Z, 

with origin at 1918 and X units of 1 year. 

Before proceeding with the discussion, let us summarize the general 
procedure for obtaining a trend equation of this type. (The procedure 
can be expanded for equations of higher degree.) 

(1) Set up an equation for each observation by inserting the observed 
values of X and Y in the expression F == a + bX, 

(2) Multiply each original observation equation by the coefficient of 
the ffist unknown, a. 

(3) Multiply each original observation equation by the coefficient of 
the second unknown, 5. 

(4) Sum each of the two resulting sets of observation equations. Thi. 
gives the two normal equations. 

(5) Solve the two normal equations simultaneously for 5. 

(6) Substitute the value of 5 in equation I and obtain the value of a 

(7) Check the solution by substituting the values of a and 5 in equa 
tion II. 


® For derivation of these normal equations, see Appendix B, section XV-1. 
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It IS not necessary to set up an elaborate table such as Table 86. 
Table 87 is much simpler. SX need not be computed, nor is it necessary 

TABLE 87 

Straight Line Trend Fiited to Data of United States Magazine Advertising, 

1918-1930 


(Thousands of lines per month) 


Year 

X 

Y 

X7 

7(7 

1918 

0 

1,547 

0 

1,879 

1919 

1 

2,142 

2,142 

1,996 

1920 

2 

2,803 

5,606 

2,113 

1921 

3 

1,856 

f 5,568 

2,229 

1922 

4 

2,030 

8,120 

2,346 

1923 

5 

2,520 

1 12,600 

2,463 

1924 

6 

2,620 

15,720 

2,580 

1925 

7 

2,623 

18,361 

2,697 

1926 

8 

2,958 

23,664 

2,813 

1927 

9 

3,038 

27,342 

2,930 

1928 

10 

3,032 

30,320 

3,047 

1929 

11 

3,384 

37,224 

3,164 

1930 

12 

2,984 

35,808 

3,280 

Total 

78 

33,537 

222,475 



Soinrce: Table 86 


Normal equations*. 

I. - iVa + hXX; I. 33,537 - 13a + 785; 

II XXY = aZX + 5SA2 II. 222,475 = 78a + 6505. 

Trend equation: 

Yc = 1,879 121 + 116 7747X 
Origin, 1918. X units, 1 year. 

to show an X^ column in this table since SX and SX^ can be looked up 
directly in Appendix M. In this appendix the sum of the first 12 natural 
numbers is shown to be 78, and the sum of the squares to be 650.^ Table 

® If we solve the two normal equations in symbolic form, a and b may be obtained 
as follows: 

SX7 - xsr . 

^ ”■ 2X2 ^xDX ' 
a = F ~ bX. 

In the present instance 


r = = 2,579.7692. 

. SXy - ZSF 222,475 - 6(33,537) 21,253 

^ “ 650 - 6(78) 182 " 

o = 7 - 5X = 2,579 7692 - (116.7747)6 = 1,879.121. 
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87 also provides an extra column for recording the trend values; fchey are 
obtained by the usual method of adding to a (1,879.121), successive 
amounts of b (116.7747) for each year after 1918. 

Thus far, in the tables illustrating the method of least squares, we have 
taken the first year (1918) as the X origin. There is, however, an arith- 
metic advantage in having the origin at the middle year, which is the mean 
of the X values. It will be remembered that the sum of the deviations 
from the mean is zero. More generally, it may be stated, the terms of 
the normal equations containing sums of the odd powers of X become 
zero, an advantage that is especially important for curves of higher degi'oe 
Thus for a straight Ime 

I. 2Y ^ Na + hXX becomes 2F = Na; 

II. XXY = a2X + becomes ZXY = 

The normal equations may also 

L 
II. 

The first normal equation in this form, therefore, merely states that a 
straight line, fitted by the method of least squares, passes through the 
point X, F. The advantage in having the X values cancel out is that 
the two normal equations do not need to be solved simultaneously but 
can now be solved separately by a very simple process, thereby saving 
considerable labor. 

Odd number of items. Table 88 illustrates the procedure of straight 
line trend fitting when the X origin is taken at the middle year and when 
an odd number of years is used. Zero is placed in the X column opposite 
1924, and 2X becomes zero. Again SX^ is obtained from Appendix M. 
In this appendix the sum of the squares of the first six natural numbers is 
shown to be 91, Since we have six years on each side of 1924, the value 
of SX^ is 2 X 91 = 182. As indicated below Table 88, the trend equa- 
tion is 

Yc - 2,579.77 + 116.7747X, 

with origin at 1924 and X units of one year. Observe that, after the 
trend equation is stated, two essential qualifying statements are recorded: 
(1) that the origin was taken at 1924, and thus the value of X == 0 in 
1924, and Fa = a for that year; (2) that b has reference to the normal 
increment in production during one year. Were these statements not 
made, a person seeing the equation out of context might erroneously con- 
clude that the normal production for 1918 (the first year of the series) 


be written 
SF 

- szr 
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was 2,579.77 and that the normal monthly increase was 116.7747. Table 
88 again provides an extra column for recording the trend values. They 
are obtained by adding to or subtracting from a (2,579.77) successive 
amounts of b (116.7747). Note that, for 1918, Yc = 1,879. Obviously, 
therefore, if the origin be shifted to 1918, we may state the equation 

Yc - 1,879 + 116.7747X, 

which is the same equation that w^as obtained previously. 

TABLE 88 

Fitting Straight Line Trend: Odd Number op Items 

(United States magazine advertising data, thousands of lines per month) 


1 

Year 

X 

Y 

XY 

Yc 

1918 

-6 

1,547 

- 9,282 

1,879 

1919 ! 

-5 

2,142 

-10,710 

1,996 

1920 j 

-4 

2,803 

-11,212 

2,113 

1921 

-3 

1,856 

- 5,568 

2,229 

1922 

-2 

2,030 

- 4,060 

2,346 

1923 

--1 

2,520 

- 2,520 -43,352 

2,463 

1924 

0 

2,620 

0 

2,580 

1925 

1 

2,623 

2,623 

2,697 

1926 

2 

2,958 

5,916 

2,813 

1927 

3 i 

3,038 

9,114 

2,930 

1928 

4 i 

3,032 

12,128 

3,047 

1929 

5 

3,384 

16,920 

3,164 

1930 

6 

2,984 

17,904 64,605 

3,280 

Total 


33,537 

21,253 



Source: Table S5 


Normal equations: 


Trend equation: 


I. a 
II. h 


— ^ 
N 

ZXF 

SX2 


33,537 _ 
13 

182 


2,579.77. 

= 116.7747. 


Yc - 2,579.77 + 116.7747X. 
Origin, 1924. X units, 1 year. 


Even number of items. The reader has probably already wondered 
whether the procedure described could be applied if the trend were to be 
fitted to a period with an even number of years, say 1915-1930. The pro- 
cedure is only slightly modified. If the years 1915-1930 inclusive are 
used, the middle of the period falls between 1922 and 1923. From tMs 
point of time it is one-half year to the middle of 1922 and one-half year tc* 



406 


SECULAR TREND 


[Chap. 15 


the middle of 1923. Since it would be inconvenient to use fractions in 
the computations, however, one unit of the independent variable X is taken 
to represent six months. Therefore 1922 is labeled — 1 and 1923 is labeled 
1, as in Table 89. There is, of course, an interval of two 6-month periods 

TABLE 89 

Computation of Straight Line Trend: Even Number of Items 

(United States magazine advertising data, thousands of lines per month) 


Year 

Z 

Y 

ZF 

Ya 

1915 

-15 

1,407 

-21,105 

1,508 

1916 

-13 

1,669 

-21,697 

1,626 

1917 

-11 

1,772 

-19,492 

1,745 

1918 

- 9 

1,547 

-13,923 

1,864 

1919 

- 7 

2,142 

-14,994 

1,983 

1920 

- 5 

2,803 

-14,015 

2,102 

1921 

- 3 

1,856 

- 5,568 

2,221 

1922 

- 1 

2,030 

- 2,030 -112,824 

2,340 

1923 

1 

2,520 

2,520 

2,458 

1924 

3 

2,620 

7,860 

2,577 

1925 

5 

2,623 

13,115 

2,696 

1926 

7 

2,958 

20,706 

2,815 

1927 

9 

3,038 

27,342 

2,934 

1928 

11 * 

3,032 

33,352 

3,053 

1929 

. 13 1 

3,384 

43,992 

3,172 

1930 

15 

2,984 

44,760 193,647 

3,290 

1931* 

17 

2,409 


3,409 

1932* 

19 

1,763 i 


3,528 

1933* 

21 

1,555 


3,647 

1934* 

23 

2,027 


! 3,766 

1935* 

25 

2,115 


3,885 

1936* 

27 

2,378 


4,004 

1937* 

29 



4,122 

Total i 


38,385 

80,823 



^ X and Y values for years after 1930 are not used in computing trend 
Source: See Table 85 


Normal equations: 


Trend equation: 


^ SF 38,385 
I. o = 

^ 2XY 80,823 _ 

II. b 2X2 - x_3eo “ 59-4287. 


Ya = 2,399 06 + 59.4287Z. 

Origm, 1922-1923. Z units, 1 year. 


between any two points a year apart; therefore, 1921 is shown as* —3, 
1924 as 3, and so on. In obtaining a value for in this case we must 
turn to Appendix N, which shows the sums of squares of odd natura? 
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numbers. The sum of the squares of the first eight odd natural numbers 
(1, 3, 5j 7j 9, 11, 13, 15) is shown to be 680; is twice that amount, 
or 1,360. 

The trend equation as shown below Table 89 is 

Yc = 2,399.06 + 59.4287X, 

with origin between 1922 and 1923, and X units of f year. In obtaining 
the trend value for 1922, we must subtract 59.4287 from 2,399.06; but to 
obtain 1921, 1920, etc., we must successively subtract twice this amount, 
or 118.8574. 

Since the main time series illustration running through this book will 
deal with magazine advertising for the period 1921 to date, a more con- 
venient statement of the trend is with origin at 1921. The trend value 
for that year is shown by Table 89 to be 2,221 (to six digits, it is 2,220.77). 
We may therefore write the equation 

Yc = 2,220.77 + 118.8574Z 

with origin at 1921 and X units of one year. This permits us to obtain 
the trend values following 1921 by successive addition only. This is most 
easily done by putting 2,220.77 in the calculating machine and 118.8574 
on the keyboard, and recording the trend value each time the value of 6 
is added. The use of an adding machine necessitates inserting the value 
of b and subtotaling to obtain each trend value, but provides a record for 
checking against possible errors. Trend values from 1931 through 1937 
are recorded in Table 89, although the data for these years were not used 
in obtaining the trend equation. Extending the trend in this fashion is a 
customary procedure, since it is not practical or desirable to recompute a 
complete new trend each year. Extension for 7 years on the basis of 16 
years’ experience is, of course, somewhat hazardous, but possibly the re- 
sults are not so unreasonable as would have been obtained if the very 
unusual years of the great depression had been included in our computa- 
tions. The fitted trend, with the extension, is shown in Chart 154, The 
portion of the trend that has been extended is dotted to distinguish it 
from the dashed line representing the trend values based upon observa- 
tions, For purposes of comparison, the trend line fitted to the period 
1918-1930 is also shown. 

We now have two least-squares trends, one for the period 1918-1930, 
and another for the period 1915-1930. The question naturally arises' 
Which is better? Inspection of Chart 164 reveals that the difference be- 
tween the two trends is so slight that for practical purposes it may be 
neglected. On logical grounds, however, the one fitted to the longer pe- 
riod is to be preferred, since measures become more reliable as the number 
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of observations is increased. Before leaving this section, it is well to lay 
down a few generalizations on this point. 

(1) In order that the trend equation may be as reliable as possible, 
the maximum number of observations available should be used, provided 
there has been no change in the nature of the trend. If the nature of the 
trend has changed, a second trend (perhaps of a different type) should be 
fitted and perhaps spliced to the old one. The following two consider- 
ations modify this generalization somewhat. 

(2) In order that the slope of the trend may be correct, it is important 
that the beginning and end of the series should not be at markedly differ- 
ent cyclical levels. If the first year is one of prosperity and the latter 
one of depression, the trend slope will have a downward bias. If the 
period begins with depression and ends with prosperity, the bias will be 
upward. Under any circumstances a period of extreme prosperity or de- 
pression near either end of the series will impart a bias to the slope in one 
direction or the other. 

(3) In order that the trend as a whole may have the correct level, the 
period to which the trend is fitted should contain about the same area of 
prosperity as of depression. For instance, if the first and last years are 
peaks of extreme prosperity, the general level of the trend wiU be too high; 
if the first and last years are troughs of extreme depression, the general 
level will be too low. 

The second and third considerations recede in importance as the period 
is increased in length. In the present instance it seemed desirable to have 
the first and last years ones of moderate recession. By so doing, a trend 
was obtained that is reasonable both as to general level and as to slope; 
that is, both a and h are reasonable. 

Adaptation of equations to monthly data. Annual data have been used 
in fitting the straight line, although it may be desired to analyze monthly 
data. The process of fitting a trend to monthly data is not different from 
that of fitting it to annual data, but there are 12 times as many values to 
fit to, and the labor is multiplied by more than 12. It is therefore advis- 
able to fit the trend to annual data, and then to transform it to a monthly 
basis. The difference between the two methods is usually negligible. In 
fact, it is probably better to use annual data than monthly data that has 
not been deseasonalized, since the presence of seasonal movement, if very 
violent, may distort the trend. 

It will be recalled that the trend for the annual data for the yeai-s 
1915-1930, with origin at 1921, is Yc = 2,220.77 + 118.8574Z. If the 
annual increment is 118.8574, the monthly increment is 118.8574 12 = 

9.90478. However, there is an added difficulty: 2,220.77 is the value of 
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Fc at the end of June 1921, whereas the monthly advertising data were 
considered as of the middle of the month. It is 5| months from the end 
of June 1921 to the middle of January 1921. The Yc value for January 
1921 therefore is 2,220.77 - 5.5(9.90478) = 2,166.29, and the monthly 
equation with origin at January 1921 becomes Yc = 2,166.29 + 9.90478X. 
Trend values for each month from January 1921 to December 1930 may 
now be obtained simply by adding successive increments of 9.90478 to 
2,166.29. As a check, the December 1937 figures should be 2,220 77 + 
197.5(9.90478) = 4,176.96. 

To summarize, we may say that, when the trend is fitted to annual 
averages of monthly data and when the number of years is odd, in order 
to convert the X units from years to months it is necessary to divide only 
the constant h by 12. If the trend has been fitted to an even number of 

TABLE 90 

United States Magazine Advertising, Expressed as An- 
nual Totals and as Averages per Month, 

BY Years, 1918-1930 


Year 

Total annual 
advertising 
(thousands of lines) 

Average montMy 
advertising 
(thousands of lines) 

1918 

18,569 

1,547 

1919 

25,702 

2,142 

1920 

33,638 

2,803 

1921 

22,271 

1,856 

1922 

24,365 

2,030 

1923 

30,233 

2,520 

1924 

31,442 i 

2,620 

1925 

31,473 ! 

2,623 

1926 

35,491 

2,958 

1927 

36,453 

3,038 

1928 

36,379 

3,032 

1929 

40,606 

3,384 

1930 

35,804 

2,984 


Source: See Table 85. 


years, the value for h can be multiplied by 2 to convert it into change per 
year, after which the procedure is as above; or the transformation can be 
made directly, by dividing h by 6. In the foregoing illustration the trend 
has been fitted to average monthly values. Suppose, however, that the 
data were total magazine advertising for each year, A simple but some- 
what laborious procedure is to divide the original data by 12 in order to 
reduce them to average monthly production, and then to proceed as above. 
If it is desired to fit the trend to the annual totals, a somewhat different 
procedure must be followed- 
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The data for United States magazine advertising can be set forth in 
either of two ways shown in Table 90: (1) as annual totals; (2) as monthly 
averages for each year. 

Straight line equations with origin at 1924 are as follows: 

Total annual advertising: Fc - 30,957.24 + 1,401.2964Z; 

Average monthly advertising: Yc = 2,579.77 + 116.7747X. 

The second equation is the sanae as that shown below Table 88. It will 
be noticed that the values of a and h are exactly 12 times as great in the 
first equation as in the second. This is, of course, logical, since the first 
equation deals with annual totals, while the second deals with monthly 
averages. Thus, dividing the first equation by 12 changes a from the 
normal advertising for the entire year, 1924, to the normal advertising 'per 
month of 1924; and changes h from the normal annual increase in total 
advertising per year to the normal increase during the course of a year in 
average monthly advertising j as, for instance, the typical change in monthly 
advertising from one January to the next January, However, what is 
wanted is the typical change in monthly advertising from one month to 
the next. In other words, we have converted the Y units of measurement 
from an annual to a monthly basis, but not the X units. It is therefore 
necessary to divide h through again by 12 (or by 144 altogether). The 
equation now becomes 

Yc = 2,579.77 + 9.7312Z, 

with origin at the middle of 1924 (June 30) and X units of one month. 
To shift the origin to January 1921, which is 41.5 months before the middle 
of 1924, we compute 

Yc = 2,579.77 - 41.5 (9.7312) = 2,175.93, 
which is the new a, and the equation in the desired form becomes 
Yc = 2,175.93 + 9.7312Z, 

with origin at January 1921 and X units of one month. To summarize, 
we may say that, when the trend is fitted to annual totals rather than 
averages of monthly data and when the number of years is odd, in order 
to convert the X and F units from annual to monthly terms, it is neces- 
sary to divide the constant a by 12 and the constant 5 by 144. If the 
trend has been fitted to annual totals with an even number of years, the 
X units, as has been explained, are six months. The value for h can be 
multiplied by 2 to convert into changes per year, after which the pro- 
cedure is as above. Or the transformation can be made directly, by 
dividing a by 12 and h by 72. 

The tabular summary below is in convenient form for ready reference 
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when we wish to derive, from an equation fitted to annual data, an equa- 
tion for use with monthly data. 


Number 

of 

T 5 T)e of data 

Monthly averages 

Annual totals 

years 

a 

b 

a 

h 

Odd 

No 

change 

Divide 
by 12 

Divide 
by 12 

Divide 
by 144 

Even 

No 

change 

Divide 
hy 6 

Divide 
by 12 

Divide 
by 72 


Under all circumstances, shifting the origin to some convenient month 
involves an adjustment of hah a month. 

Other Simple T3rpes of Trends 

Series of curves. Occasionally no single curve will seem adequately to 
describe the trend. The electric power production series illustrated in 
Chart 153 is perhaps an illustration, but there the trend is shown as being 
discontinuous. A better illustration is Chart 155, in which are shown 
two connected straight lines fitted to average tractive power of steam 
locomotives. The first line is a least-squares fit to the 1923-1929 data, 
while the second is to the 1929-1935 data. This gives two trend values 
for 1929: 44.73 by the former equation, and 44.63 by the latter. In this 
case, inspection of the chart seems to show that a better trend would 
result if the latter figure were used. The splicing together of two trends 
is always a highly subjective procedure, and no general rules can be laid 
down for its accomplishment. The use of a series of curves is applicable 
not only to straight line trends but to any other type (or types) of curve. 
See, for instance, Chart 207. It is, however, better to avoid this method 
unless it is strongly supported by the appearance of the chart, and prefer- 
ably also by the logic of the situation. 

Related series as tread. Sometimes the trend of a series that has con- 
siderable amplitude of fluctuation can be described by the actual values 
of some related series that is more stable. Thus, bond yields are found 
to provide a not imreasonable trend for commercial paper rates, as shown 
in Chart 156. This method by itseK, however, does not have very wide 
applicability. It should, of course, be used only when logically justified 

A modification of this method, which is perhaps more widely applicable 
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is to utilize some other series as part of the trend. Thus, Warren M. 
Persons used population growth as one element in the trend of Barron's 
Annual Index of Production and Trade. His method is illustrated in 
Table 91. First the unadjusted index numbers are divided by figures 
representing the population of the United States relative to 1923-1925, 

THOUSANDS 
OF POUNDS 



Chart 155. Series of Straight Lines as Trend of Average Traction Power of Steam 
Locomotives, 1923-1935. (Data from Committee on Public Relations of Eastern Rail- 
roads, A Yearbook of Railroad Information^ 1986 Edition^ p. 6.) 

in order to obtain an index adjusted for population growth. This pro- 
cedure has been referred to also in Chapters VII and XIV. The unad- 
justed index and the population curve are shown in part A of Chart 157, 
while the adjusted data and straight line trend are shown in part B of this 
chart. The straight line now seems to provide a good trend for these data. 
The straight line trend values are recorded in column 5 of Table 91. The 
trend values are the product of the two separate elements; that is, they 
are obtained by multiplying together the straight line values and the 
population relatives. The final results are shown in part C of Chart 157. 
The trend line, while not quite so smooth as the usual mathematical curve, 
seems to be a good description of the trend of this series. 

CycEcal averages. When it is desired to measure cyclical deviations 
from a trend so obtained that the positive and negative portions of each 
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1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 

Chart 156. Bond Yields and Commercial Paper Rates, 1919-1936. (Data from Frederick R. Macaulay, Bond Yields, Interest Rates, 
and Stock Prices^ Publication No. 33, National Bureau of Economic Research, New York, 1938, pp. AldT-Albl.) 





1904 1908 1912 1916 1920 1924 1928 1932 1936 

C 

Chart 157. Combination of Population Relatives and Straight Line Values to Obtain 
Trend for Barron’s Index of Production and Trade. (Final trend line of Section C is 
product of the trend lines of Sections A and B; data are from Table 91.^ 
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cycle will be approximately equal, a number of methods are available. 
The moving average method has already been described. Closely related 


TABLE 91 

Use of Population Estimates and Straight Line in Computing Trend Values 
TO Barron’s Index op Trade and Production 


Year 

(1> 

Unadjusted 

production 

index 

(1923-1925 = 100) 

(2) 

Population 
of U S 
relative to 
1923-1925 
(per cent) 

(3) 

Index 

adjusted for 
population 
growth 

[Col 2 - Col 3] 
(4) i 

straight 
line 
trend 
fitted to 
data of’*' i 
Col 4 1 

(5) 1 

i 

Pinal 

trend values 
[Col 3 X Col 5]* 

(6) 

Production 
index 
adjusted 
for trend . 
[Col. 2 ^ Col. 6]t 

(7) 

1904 

48 3 

72 2 

66 90 

69 47 

50 2 

96.2 

1905 

57.8 

73 6 

78 53 

70.72 1 

52 0 

111 1 

1906 

60.8 

75 0 

81.07 

71 97 

54 0 

113 1 

1907 

60 0 

76 5 

78.43 

73 21 

56 0 

107 0 

1908 

52 1 

77 9 

66 88 

74 46 

58 0 

! 89 7 

1909 

63 2 

79 2 

79 80 

75 70 

60 0 

105 5 

1910 

64 2 

80 9 

79 36 

76.95 

62.3 

102 9 

1911 

62.7 

82.1 

76 37 

78 20 

64 2 

97. S 

1912 

70 4 

83 4 

84 41 

79.44 

66 2 

106 0 

1913 

72 4 

85 0 

85.18 

80 69 

68.6 

105.4 

1914 

67 0 

86 6 

77 37 

81 93 

710 

94 3 

1915 

72 9 

87 8 

83 03 

83 18 

73 1 

99.6 

1916 

84 6 

89 1 

94 95 

84 43 

75 2 

112.3 

1917 

83.7 

90 3 

92 69 

85,67 

77.4 

107.9 

1918 

82.6 

91.3 

90 47 

86 92 

79 3 

103.8 

1919 

80.2 

91.9 

87 27 

8816 

812 

99.8 

1920 

82 7 

93.1 

88 83 

8914 

83.0 

99.7 

1921 

68.3 

94.8 

72 05 

90 66 

86 0 

78.8 

1922 

84.9 

96 5 

87 98 

91 90 

887 

95.8 

1923 

99.0 

98 2 

100 81 

93 15 

91.5 

107.9 

1924 

95,0 

1001 

94.91 

94 39 

94.6 

100.4 

1925 

1061 

1016 

104.43 

95 64 

97.1 

109.2 

1926 

1101 

103.1 

! 106 79 

96 89 

99,8 

110.5 

1927 

108 7 

104 4 

104 12 

98 13 

102.5 

106.5 

1928 

113 2 

105.6 

107 20 

99 38 

105.0 

108.7 

1929 

1 116 8 

107 2 

108.96 

100 65 

107.9 

109.4 

1930 

i 95.6 

108 8 

87.87 

101.85 

110 9 

87.6 

1931 

79 5 

109 5 

72 60 

103 15 

113 0 

71.6 

1932 

60 6 

110.3 

54.94 

104 35 1 

114 9 

53.4 

1933 

68.3 

111.1 

61.48 

105 65 

117 3 

58.2 

1934 

73.4 

111.8 

65 65 

106 85 ^ 

119.4 

61.5 

1935 

815 

112 6 

72.38 

108 15 

121.7 

66.9 

1936 

96.8 

113 4 

85.36 

109 35 

124.0 

78.2 


* Fitted to 1899-1931 data 

t This column can be obtained more easily by dividing column 4 by column 5, 
Source. Data furnished by Barron' s. The National Fina'^xial Weekly. 


is the following method which consists basically in obtaining one or more 
typical points for each cycle and connecting such points by a straight line. 
This method is highly subjective, and depends for its validity upon the 
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ability of the statistician to locate the high point and the low point of 
each cycle. Although annual data are used in this illustration, monthly 
data can be used at least equally well. In case monthly data are used, 
it is well, however, to smooth them by means of a 12-month centered 
moving average, in order to iron out seasonal variations and accidental 
peaks and troughs, either of which might be confused with cyclical turning 
points. 

TABLE 92 

Trend Line by High-Low Mid-Point Method Pitted to Passenger Automobile 

Production, 191/~1938 


Year 

(1) 

Average monthly 
production 
(thousands 
of cars) 

(2) 

Highs with 
interpolations 

(3) 

Lows with 
interpolations 

(4) 

High-low 
mid points 
[Average of Col 3 
and Col. 4] 

(5) 

1917 

145 5 (H) 

145 5 (H) 



1918 

78 6 (L) 

149 9 

78 6 (L) 

114 2 

1919 

138 1 

154 4 

92 5 

123.4 

1920 

158 8 (H) 

158 8 (H) 

106 3 

132 6 

1921 

120 2 (L) 

206 6 

120 2 (L) 

163 4 

1922 

189 5 

254 3 

168 6 

2114 

1923 

302 1 (H) 

302 1 (H) 

j 217 1 

259 6 

1924 

265.5 (L) 

306 5 

* 265 5 (L) 

286 0 

1925 

311 3 

310 9 

258 6 

284 8 

1926 

315 3 (H) 

315.3 (H)’ 

2516 

283 4 

1927 

244 7 (L) 

337 6 

244.7 (L) 

2912 

1928 

318 0 

360 0 

214.7 

287 4 

1929 

382 3 (H) 

382 3 (H) 

184.7 

283.5 

1930 

232 1 

375 3 

154 6 

265.0 

1931 

164.4 

368.3 

124 6 

246 4 

1932 

94 6 (L) 

361,3 

94.6 (L) 

228 0 

1933 

131 1 

354 3 

106 6 

230 4 

1934 

181 5 

347.3 

118.6 

233.0 

1935 

2710 

340 3 

130 6 

235.4 

1936 

305.8 

333.3 

142 7 

238.0 

1937 

1938 

326 3 (H) 
166.7 (L)* 

326 3 (H) 

154.7 

166 7 (L) 

240 5 

— lll■w^ 


(H) = Cvdical High (L) = Cyclical Low 
1938 IS taken tentativeli as a e^clicpl 

Source Production aata from Department of Commerce, Survey of Current Btmness^ 

1936 Supplement, p 147; 1938 Supplement, p. 160, Tebraarj 1939, p 95 


One method, which is used by the Cleveland Trust Company, is illus- 
trated in Table 92 and Chart 158, and may be called the high-low mid-point 
method. The procedure is as follows: 

(1) Determi n e the high point of each cycle. 

(2) Connect the high points by straight lines. In Chart 158 these are 
light dashed lines. 
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(3) Determine, by arithmetic interpolation, the values on this line for 
each year (see column 3 of Table 92). 

(4) Determine the low point of each cycle ; connect the low points by 
straight lines and interpolate (see columns 2 and 4, Table 92). 

(5) Average the high and low values for each year, thus obtaining the 
mid-points which are shown by the heavy dashed line on Chart 158 and 
by the values in column 5 of Table 92. 

Thousands 

OF CARS 



Chart 158, Trend by H’gh-Low Mid-Point Method Fitted to Passenger Automobile 
Production, 1917-1938. (Data of Table 92.) 

Variations of this method are possible, but will not be illustrated here. 
Thus, instead of connecting the high points of each cycle and the low 
points, we might connect the average value of each cycle, such points 
being centered at the middle of each cycle (see Babson’s “X-F line” of 
Chart 255 and explanation on page 814); or the points might be the aver- 
ages of half cycles, running from high to low and from low to high. 

All of these cyclical average methods are open to the possible objection 
that, first of aU, the statistician decides what fluctuations he wishes to 
identify as cycles, and then chooses his high and low points so that the 
trend wiU go through these cycles. These methods, therefore, while sim- 
ple, require even more judgment than do moving averages, and, as is the 
■case with the moving average method, a trend can never be up to date. 
Another possible objection is that the resulting curve is not very smooth. 
The curve may, however, be looked upon as a first approximation to be 
smoothed either freehand or by a mathematical curve. Finally, it may 
he objected that cyclical average curves do not look like trends at all. 
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In reality they are not primary trends, but resemble more closely the 
combined primary-secondary trends, which were mentioned briefly in 
Chapter XIV. 


Selecting the Type of Trend 

(1) As a "Srst step the data should always be plotted and a curve fitted 
tentatively by inspection. The plotting should be on semi-logarithmic 
paper as well as arithmetic. Should the trend appear to be of a simple 


PER CENT 



Chart 159. United States Magazine Advertising Adjusted for Trend, and Barron’s 
Index of Production and Trade, by Years, 1915-1936. (Data of Tables 91 and 93.) 

type on semi-logarithnaic paper, then one of the methods to be described 
in the following chapter may be appropriate. 

(2) If the object of the analysis is solely to measure cyclical deviations, 
a reasonably smooth curve should be selected which approximately passes 
through the center of the different cycles. This may be accomplished by 
one or more straight lines, or a moving average or line of cyclical averages 
may be useful. A primary trend, likewise, should pass through the center 
of the secondary waves. 

(3) If the object is analysis of the trend itself, or prediction, a mathe 
matical equation should be used. Such an equation is also a concise way 
of defining the trend. 

(4) A curve should be logical in the sense that it behaves in a manner 
which seems reasonable when we consider the forces affecting the series. 
The use of a moving average for a primary trend would seem to confess 
lack of complete understanding of, or hypothesis concerning, the social 
forces at work. A straight line also may not be logically satisf3nng. On 
the other hand, population growth is a logical explanation of part of the 
long term growth of many series. It is quite likely,, tyowfoverj^ that more 
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complicated curves, such as are described in Chapter XVI, will be required 
to satisfy this criterion than are described m this chapter. 

(6) In case there is difficulty m deciding which among several equation 
types is most logical, objective tests (which wiU be described in Chapter 
XVI) may be applied which purport to show the mathematical law that 
the series approximates. 


TABLE 93 

Adjustment of United States Magazine Advertising Data 
FOR Trend, 1915-1937 

(Onginal data and trend values in thousands of lines. Trend line fitted to 
data for 1915-1930 and extended from 1930 to date ) 


Year 

Original 

data 

Y 

Trend 

values 

Yc 

Per cent 
of trend 
100(F Yc) 

1915 

1,407 

1,508 

93 

1916 

1,669 

1,626 

103 

1917 

1,772 

1,745 

102 

1918 

1,547 

1,864 

83 

1919 

2,142 

1,983 

108 

1920 

2,803 

2,102 

133 

1921 

1,856 

2,221 

84 

1922 

2,030 

2,340 

87 

1923 

2,520 

2,458 

103 

1924 

2,620 

2,577 

102 

1925 

2,623 

2,696 

97 

1926 

2,958 

2,815 

105 

1927 

3,038 

2,934 

104 

1928 

3,032 

3,053 

99 

1929 

3,384 

3,172 

107 

1930 

2,984 

3,290 

91 

1931 

, 2,409 

3,409 

71 

1932 

1,763 

3,528 

50 

1933 

1,555 

3,647 

43 

1934 

2,027 

3,766 

54 

1935 

2,115 

3,885 

54 

1936 

2,378 

4,004 

59 

1937 

2,671 

4,122 

65 


Source Table 89 


Adjustment for Trend 

It was suggested that one object of measuring trend is to measure cycKcai 
deviations from it. If we are dealing with monthly data, a first step in 
obtaining such cycles may be to express the data as percentages of trend 
by dividing the original data by the monthly trend values (and multiply- 
ing by 100). But, in order completely to isolate cycles, we must also 
eliminate seasonal and irregular movements. Since it is difficult to see 



420 


SECULAR TREND 


[Chap. 15 


what direct use can be made of data adjusted for trend alone (while, on 
the other hand, seasonally adjusted data are in common use), it is cus- 
tomary to eliminate the seasonal first and then the trend. Consequently, 
in this chapter, adjustment of monthly data for trend is not made. How- 
ever, the magazine advertising annual data are divided^ by their trend 
values, giving the annual values of the cyclical movements. These cyclical 
relatives based on annual data are rough measures and can be used in 
comparison with other annual data similarly adjusted. The process of 
computation is shown in Table 93, and the resulting cyclical relatives are 
plotted in Chart 159. This series shows much the same peaks and troughs 
as most economic series, but indicates a tendency to lag behind general 
business. Note that cyclical troughs occurred in 1925, 1928, and 1933, 
rather than in 1924, 1927, and 1932. 

Selected References 

E. C. Bratt: Business Cycles and Forecasting, Chapter III; Business Publications, 

Inc., Chicago, 1937. Mainly a consideration of economic factors. 

R. E. Chaddock- Principles and Methods of Statistics, pages 306-336; Houghton 
Mifflin Co., Boston, 1925. Trend fitting is considered as a special application 
of correlation analysis. 

F. E. Croxton and D. J. Cowden: Practical Business Statistics, Chapter XV; 

Prentice-Hall, Inc , New York, 1934. 

F. C. Mills: Statistical Methods Applied to Economics and Business (Revised Edi- 
tion), pages 231-253; Henry Holt and Co., New York, 1938. Moving averages 
are fitted to hypothetical data, clearly illustrating the principles involved. 

E. C. Rhodes: Elementary Statistical Methods, pages 211-233; George Rutledge and 
Sons, London, 1933. Moving averages. 

C. H Richardson: An Introduction to Statistical Analysis, Chapter VI; Harcourt, 
Brace and Co., New York, 1934. Applies the method of least squares and 
the method of moments to fitting linear trends. 

We could subtract the trend rather than divide by it. This would give absolute 
rather than relative deviations. For most purposes, however, it is more useful to 
know whether the vari^'fion'^ nrc Inrgo relative to some logical base, such as the trend. 
Thus, a deviation of 50 j? um umo=> as important when judged with respect to a 
trend value of 200 than it is when compared with a trend value of 2,000. 



CHAPTER XVI 


OTHER TREND TYPES 


In Chapter XV only the simplest type of trend equation was discussed — 
the straight line. Frequently it is the case that for short periods of time 
the straight line gives a reasonably good fit. When viewed over a long 
period of time, however, many series do not appear to follow so simple a 
“law.'^ Usually the slope is gradually changing; even the change in the 
slope may be changing. It is apparent, for instance, that the growth 
of rayon consumption since 1919 is not adequately described by a 
straight line.^ Even a casual inspection of Chart 160 reveals that any 
adequate trend line for these data must have at least one bend in it. 

It is the object of this chapter to describe the properties of several such 
equations, to give directions for their fitting, and to explain in somewhat 
greater detail how to select from among the numerous types at the dis- 
posal the statistician. 

Weighted Moving Averages 

Before describing more equation types, however, it is well to note that 
smoother and more flexible results can be obtained by the introduction of 
weights into moving averages, than can be had by the use of ordinary 
unweighted moving averages. A centered 2-year moving average, it was 
noted, could also be thought of as a weighted 3-year moving average in 
which the middle year is given a weight of two. A system of weights often 
used is known as binomial. Thus, (a -f b)^ = + 2ab + giving the 

weights 1, 2, 1. Binomially, weights for a 5-year moving average can be 
obtained from (a + i>)^ = +• the weights 

being 1, 4, 6, 4, 1. For 7 years they are 1, 6, 15, 20, 15, 6, 1. 

It would be laborious to compute directly a 7-year bmomially weighted 


1 Although data are available as far back as 1911, it does not seem advisable in this 
instance to fit one curve to the whole period Judging from plotted data, it appears 
that the World War so interrupted the early growth of the industry that it practically 
had a fresh start in 1919, 
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moving average, but a short cut is possible. Compute first a weighted 
S-year moving average; then take a 3-year weighted moving average of 
the result, and another weighted 3-year moving average of this result.^ 



Chart 160. Binomially Weighted Moving Average Trends Fitted to United States 
Rayon Consumption, 1919-1937. (For purposes of comparison the different curves 
have been plotted close together on the same chart instead of on separate charts. Each 
curve IS plotted to the same vertical scale, but at a different level. This arrangement 
is sometimes referred to as a multiple axis chart. Data of Table 94.) 

This procedure is followed in Table 94. In general language, for a bino- 
mial of N terms, take a binomially weighted 3-term moving average suc- 
cessively ^2^ times. 

It is very easy to compute a weighted average at one operation on a 


2 The logic of this procedure is perhaps most easily understood by a careful study of 
the table on page 423, the construction of which is self-explanatory. 
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calculating machine of the Monroe or Marchant type. The 3-year mov- 
ing average (11,620, in Table 94) is obtained as follows' 


( 1 ) 

(2) 

(3) 

(4) 

(5) 

(6) 
( 7 ) 


Set machine for multiplication. 


Put in keyboard 
Put in keyboard 
Put in keyboard 


9,291; depress plus bar; clear keyboard 
8,718; depress plus bar twice; clear keyboard. 
19,751; depress plus bar; clear keyboard 


Total in lower dial IS . 46,478 

Put in keyboard . 46,478, clear both dials 

Multiply by 25, the reciprocal of 4, obtaining 11,620 


TABLE 94 

Computation of Binomially Weighted ""-Year Moving Average of (Jniteo States 
Rayon Consumption, 1919-1937 

(Thousands of pounds) 


Year 

Consumption 

Moving average 

3-year 

5-year 

7-year 

1919 

9,291 




1920 

8,718 

11,620 



1921 

19,751 

18,242 

18,389 


1922 j 

24,747 

25,451 

25,542 

25,826 

1923 i 

32,558 

33,026 

33,833 

34,274 

1924 

42,243 

43,830 

43,886 

44,366 

1925 

58,277 

54,857 

65,860 

56,705 

1926 

60,630 

69,896 

71,214 

71,962 

1927 

100,048 

90,207 

89,558 

89,226 

1928 

100,101 ! 

107,924 

106,574 

105,651 

1929 

131,448 ! 

120,241 

119,898 

119,642 

1930 

117,968 

131,186 

132,199 

133,066 

1931 

157,360 i 

146,182 

147,970 

149,253 

1932 

152,041 i 

168,331 

168,872 

169,375 

1933 

211,883 i 

192,644 

191,786 

192,431 

1934 

194,771 j 

213,525 

217,281 

218,440 

1935 

252,676 

249,429 

247,412 


3936 

297,594 

277,265 



1937 

261,195 



... 


Source: See Table 80. 


It is well to plot the original data as well as the smoothed data after 
each successive smoothing as in Chart 160. This will accomplish the 
two-fold object of bringing to light any large errors in computation, and 
telling when the data have been sufficiently smoothed. 

The greater smoothness of weighted moving averages is due to the fact 
that an item exerts slight influence on the final average when it is first 
included, but it gradually grows more powerful, and then as gradually 
dwindles in importance until it finally disappears. This wfll be clear ii 
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we examine the weight pattern of a binomial of 15 items, which can be ob- 
tained by seven smoothings of original data by weighted 3-item moving 
averages. The weight pattern is: 1; 14; 91; 364; 1,001; 2,002; 3,003; 
3,432; 3,003 ; 2,002; 1,001; 362; 91; 14; 1. Thus any member, when it 

appears for the first or last time, exerts only : r - of the influence exerted 

iu,oo4 

by all the numbers (16,384 being the sum of all the frequencies in the 
weight pattern), while in a simple moving average the initial or terminal 

influence is jg). On the other hand, greater flexibility is obtained by the 

fact that the maximum influence obtained by any one number in a bino- 
3 432 1 

mial 15 is as compared with for the unweighted average. Chart 

Iu,oo4 lo 

^61 is a diagram of the above weight pattern. A binomial weighting sys- 

WEIGHT 



Chart 161. Weight Diagram of a 15-Item Binomially Weighted Moving Average. The 

weights total 16,384. 

tern is but one type of weight pattern, some types even introducing nega« 
tive weights in certain portions. In general it may be said that the 
smoother the weight pattern, the smoother the resulting trend. Since a 
binomial weight pattern is very smooth, so also is the trend resulting from 
the use of it. 

Because of the irregularity of cyclical movements it is usually not readily 
apparent what length of moving average to use. It is therefore convenient 
merely to smooth the data by successive moving averages until the desired 
smoothness is obtained. Seven years seems sufficient for the rayon data. 
One objection to a weighted moving average similar to those discussed is 
that it requires so many terms to smooth out undesirable minor wave-like 
movements, thus removing so many end values from the trend. It seems 
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not unreasonable to contend, therefore, that for most purposes sufficiently 
good results will be obtained by using a smaller number of years in a simple 
moving average, and then smoothing the results by a small term binomial 
moving average, or on a chart freehand. Several other objections to mov- 
ing average trends, regardless of the weighting system, were discussed in 
Chapter XV. It is worth repeating that no expression for the trend is 
available when we use moving averages. 

Simple Polynomials 

This family of curves has as its most elementary type the straight line, 
which, it will be remembered, has two constants. As will be explained, 
additional constants introduce one or more bends into the curve. Below 
are given the equation types of the five simplest varieties: 

First degree (straight hne) Yc — a hX 

Second degree (parabola) . - Yc — a hX cX^. 

Third degree (cubic) . Yc — a hX + cX^ 4* dX^. 

Fourth degree (quartic) . . Yc — a + hX + cX^ -j- dX^ + 

Fifth degree (qmntic) . . , 7c = a -f Z>X + cX^ + dX® + eX^ 4- /X®. 

Fourth or fifth degree curves, which may change in slope from positive to 
negative direction, or from negative to positive direction, respectively three 
and four times, hardly coincide with the concept of primary trend as set 
forth previously, and only the second degree curve will be explained in 
detail here. 

Second degree curve. This curve is but one degree more complicated 
than a straight line. It differs in that its slope is continually changing in 
such a way that the curve has one bend. If a sufficient number of X 
values are included, it is inclined positively in one part and negatively in 
another. Eight of these curves are shown in Chart 162. 

The mechanics of determining the Yc values are not difficult, and will 
be illustrated for the curve shown in section 1 of the chart referred to. 
The values of a, 6, and c are taken to be 5, 2, and —.3 respectively; and 
the equation is therefore 

7(7 - 6 + 2X - .3X2 

When X is 0, 

7(7 - 5 + 2(0) - .3(0)2 5. 

When X is ~4, 

= 5 + 2(~4) -- .3(-~4)2 = 5 8 4.8 = -7.8. 

In like manner other values may be substituted, with the results that are 
tabulated at the right of this curve in Chart 162. 

The meaning of the constants a, 6, and c as applied to second degree 
curves may now be summarized: a indicates the Yc value when X — 0 
b indicates the amount and direction of the slope at the point where 
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CJiart 162. Second Degree Equations and Ctirves. 
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X = 0; 2c indicates the amount of change in the slope per unit of X, and 
whether this curvature is such as to make it slope upward or downward 
when the value of X is taken as large and positive. 

TABLE 95 

Second Degree Curve Fitted to United States Rayon Consumption 


(Thousands of pounds) 







I Computation of trend values 

Year 

X 

con- 

sumption 

V 

xr 

X-^V 

X^ a -hbX 

cX2 

Trend 

value 

re 

1919 

-17 

9,291 

-157,947 

2,685,099 

289 -44,795 5 

54,397 1 

9,602 

1920 

-15 

8,718 

-130,770 

1,961,550 

225 -28,020 0 

42,350.7 

13,331 

1921 

-13 

19,751 

-256,763 

3,337,919 

169 -13,244.6 1 

31,810.1 

18,566 

1922 

-11 

24,747 

-272,217 

2,994,387 

121 2,530 9 

22,775.2 

25,306 

1923 

- 9 

32,558 

-293,022 

2,637,198 

81 18,306 3 

15,246 2 

33,552 

1924 

- 7 

42,243 

-295,701 

; 2,069,907 

49 34,081 8 

9,223 0 

43,305 

1925 

- 5 

58,277 

-291,385 

1,456,925 

25 49,857 2 

4,705.6 

54,563 

1926 

- 3 

60,630 

-181,890 

545,670 ^ 

9 65,632 6 

1 1,694.0 

67,327 

1927 

- 1 

100,048 

-100,048 

100,048 

1 81,408 1 

188 2 

81,596 

1928 

1 

100,101 

100,101 

100,101 ! 

1 97,183.5 

188 2 

97,372 

1929 

3 

131,448 

394,344 

1,183,032 

9 112,959 0 

1,694 0 

114,663 

1930 

5 

117,968 

589,840 

2,949,200 

25 128,734 4 

i 4,705 6 

133,440 

1931 

7 

157,360 

1,101,520 

7,710,640 1 

49 144,509 9 

j 9,223 0 

153,733 

1932 

9 

152,041 

1,368,369 

12,315,321 1 

81 160,285.3 

15,246 2 

175,532 

1933 

11 

211,883 

2,330,713 

25,637,843 

121 175,060 8 

22,775 2 

198,836 

1934 

13 

194,771 

2,532,023 

32,916,299 1 

169 191,836 2 

31,810 1 

223,646 

1935 

15 

252,676 

3,790,140 

56,582,100 ! 

225 207,611.6 

42,350 7 

249,962 

1936 

17 

297,594 

5,059,098 

86,004,666 j 

1 

289 223,387 1 

54,397 1 

277,784 

Total 


1,972,105 

15,286,405 

243,457,905 





Source See Table 80 


II 15,286,405 = 1,9386 

6 = 7,887.722. 

I. 1,972,105 = 18a + l,93Sc. 

Ill 243,457,905 = 1,938a + 874,034c. 

(I X 107.66667) 212,329,978 = l,938o + 208,658.01c 

III. 243,457,905 = 1,938a + 374,034.00c 
31,127,927 = 165,375.99c 

c = 188.22519. 

I. 1,972,105 = 18a + 1,938(188.22519) 
18o = 1,607,324 58 
a = 89,295 810 

Check (III): 

243,467,905 = 1,938(89,295 810) + 374,034(188.22519) 

= 243,457,900.6. 

Trend equation: 

7c = 89,295.810 + 7,887.722Z + 188.22519X». 
Origin, 1927-1928. X units year. 
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Since there are three unknowns or constants, three normal equations 
(each with three constants) are required for a second degree curve: 

1. SF = Aa + hXX + cSX^. 

11. SZF = aSX + 62X2 + cZX^. 

III. 2X2F = a2X2 + 62X3 + c2X^. 

When we are dealing with time series, however, and the origin is taken at 
the middle of the period, the odd powers of X, of course, total zero, and 
the equations become: 

I. 2F = Aa + c2X2. 

II. 2XF = 62X2. 

III. 2X2F = a2X2 + c2X^. 

The only values which must be computed are therefore 2F, 2XF, 2X2 F, 
since 2X2 and 2X‘^ can be obtained from Appendix M or N. 

The computation of these values and the solution of the equations are 
shown in Table 95. It is to be noted that only equations I and III need 
to be solved simultaneously, since equation II has only one unknown (6) 
and equations I and III have two each (a and c ) . A word of explana- 
tion concerning the simultaneous solution of equations I and III is advis- 
able. Dividing the coefficient of a in equation I into that of a in equation 
III gives 107,66667 Therefore, in order to cancel out a, equation I is 
multiplied by 107.66667. Then equation I is subtracted from equation III, 
the result being 31,127,927 = 165,375.99c, from which c is calculated to 
be 188.22519. This value of c is now substituted in equation I, and a 
value for a obtained, (89,295.810). 

As a check on the accuracy of the solution, the values of a and c are 
substituted in equation III, which gives an agreement to eight digits. 
This does not prove that the values originally substituted in the normal 
equations were correct, but, granting their accuracy and barring counter- 
balancing errors, there has been no important mistake in the solution of 
equations I and III. It is desirable that this check be made, even though 
no check for 6 is available in this setup. The equation may now be stated: 
Yc = 89,295.81 + 7,8S7.722X + 188.22519X-, with origin between 1927 
and 1928, and X units one-half year. The computation of the Yc values 
is shown in the last four columns of Table 95, and the values are plotted 
in Chart 163. It is apparent that the fit is excellent. The curve cuts 
under all the prosperity peaks except 1927, which was an unusually low 
peak, perhaps on account of the mild general depression in the United 
States that year. Note that 1936 is a little unusual also. According to 
precedent that year should have been a depression year, and in fact the 
percentage rate of increase from 1935 to 1936 was less than from 1934 to 
1935. Nevertheless, this year was not below the trend line- Again it 
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may be that the general business pickup in 1936 served to make the rayon 
depression less severe than usual. 

Third degree curve. By adding one more constant to the equation, we 
are enabled to put one more bend into our trend curve. In Chart 164 
are shown two such equations and curves. It should be noted that a 
straight line has only one slope, a second degree curve slopes in a positive 

MILLIONS 
OF POUNDS 



Chart 163. Second Degree Curve Fitted to United States Consumption of Rayon 
1919-1936, and Trend Extension Through 1937. (Data of Table 95.) 

direction at one stage and in a negative direction at another, while a third 
degree curve includes three directions of slope. 

Four normal equations are required for a third degree curve: 

L 2F = iV'a + 62X + c2Z2 + dSX^, 

11. 2ZF = a2X + 62Z2 + c2Z^ + d2Zl 

III. 2Z2F = a2Z2 + 62Z^ + c2Z^ + d2Z^ 

IV. 2Z3F - a2Z^ + + c2Z5 + d2Z^ 

Again, if the Z origin is taken at the middle of the period, the odd powers 
of X will cancel, leaving these equations: 

I. 2F =- Vn + c2Z2 

II. 2ZF - 62Z2 + d2ZA 

III. 2Z2F a2Z2 + c2ZA 

IV. 2Z3F = &2Z^ 4- d2Z^. 
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Chart 164. Third Degree Equations and Curves. 
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To obtain the trend equation, equations I and III, containing a and c, 
must now be solved simultaneously, and likewise equations TI and IV, 
which contain b and d. Only one new column of figures in addition to 
those of Table 95 need be computed; it is the column necessary to obtain 
XX^Y. Furthermore, equations I and III are exactly the same as for 
the second degree curve; consequently the values of a and c will also be 
the same. 

A fourth degree curve, if origin is taken at the middle of the period, 
necessitates these normal equations: 

I. 2F = Va + cXX^ + cSAA 

II. XXY = 5SZ2 + dXX^. 

III. XX^Y = + cSZ^ + cSZ^ 

IV. SZ^F = hXX^ + dXX^ 

V. 2Z^F = aSZ^ + cSZ® + cSZ«. 

Such a setup permits equations II and IV to be solved simultaneously^, 
and equations I, III, and V. Some persons prefer, when solving three or 
more equations simultaneously, to employ a systematic, self-checking pro- 
cedure, such as the Doolittle method explained in Chapter XXIII. 

Empirical test of data. An additional property of this family of curves is : 

(1) The first differences of a first degree curve are constant. 

(2) The second differences of a second degree curve are constant, 

(3) The third differences of a third degree curve are constant. 

(4) The ?^th differences of an nth, degree curve are constant. 

The first and second differences of the trend values of the second degree 
fit are shown in Table 96. As can easily be observed, a first difference is 
merely the difference between any number and the number preceding it; 
while a second difference is the same thing with respect to first differences. 
The slight discrepancies in the fourth digit are, of course, due to the round- 
ing of the trend values. 

This knowledge concerning the properties of the potential series can be 
used to test any time series data for trend type. If the data are reason- 
ably regular, it is possible to take successive differences of the data until 
the differences most nearly approaching constancy are obtained. It is not 
often enlightening, howwer, to so test the original data, for the successive 
differences of the cyclical changes would be so pronounced that little could 
be discovered about the underlying trend. VTren this is true, it is prob- 
ably better first to approximate a trend by one of the flexible methods 
which do not involve the obtaining of an equation, and then to take the 
differences of the trend. Such an approximation might be based on a 
freehand smoothing of the high-low mid-point or some other cyclical aver- 
age method, or the moving average method. Although this is apparently 
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an objective method, in practice it is usually very difficult to determine 
which colunm of differences is most nearly constant. 


TABLE 96 

First and Second Differences of Second Degree Trend 
Values for United States Rayon Consuaiption Data. 
1919-1936 

(Thousands of pounds) 


Year 

Trend 

values 

First i 

differences i 

Second 

differences 

1919 

9,772 



1920 

13,481 

3,709 


1921 

18,696 

5,215 1 

1,506 

1922 

25,416 

6,720 

1,505 

1923 

: 33,642 

8,226 

1,506 

1924 

43,375 

9,733 

1,507 

1925 

54,613 

11,238 

1,505 

1926 

67,357 

12,744 

1,506 

1927 

81,606 

1 14,249 

1,505 

1928 

97,362 

[ 15,756 

1,507 

1929 

114,623 

17,261 

1,505 

1930 

133,390 

18,767 

1,506 

1931 

1 153,663 

20,273 

1,506 

1932 

I 175,442 

21,779 

1,506 

1933 

198,726 

23,284 

1,505 

1934 

223,516 

24,790 

1,506 

1935 

249,812 

26,296 

1,606 

1936 

277,614 

27,802 

i 

1,506 


Source Table 95 


Orthogonal polynomials. A disadvantage of polynomial equations of 
the type described that each additional constant added to the equation 
requires that some of the constants previously obtained be abandoned and 
new constants computed to take their place. Thus, a second degree curve 
uses the same value for 6 as a straight line, but requires a different value 
for a; a third degree curve uses the same values for a and c as a second 
degree curve, but requires a new value for 6 ; a fourth degree curve uses 
the same values for b and d as a third degree curve, but new values must 
be calculated for a and c; and so on. Orthogonal 'polynomial equations 
involve a transformation of such a nature that, as new constants are 
added, the old constants remain the same. Such equations are very con- 
venient to use, since we merely build up our equation by adding new 
constants until a satisfa.3tory fit is obtained and simultaneous solution of 
equations is avoided- There is thus no lost motion, and the labor involved 
becomes progressively less than that required to fit a curve by the ordinary 
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method for equations of third degree and higher. The trend values ob- 
tained by the two methods are exactly the same. 

Although the labor required for fitting is modest, the theory of orthog- 
onal polynomials is beyond the scope of this text, and will not be ex- 
plained here. Whereas the ordinary third degree polynomial is of the type 

Y c ~ d iX cX^ -|- dX''i 


the orthogonal polynomial is 

Fc = A -t- BXi + CX 2 + DXi. 

In working with orthogonal polynomials, the X origin is conveniently taken 
at the middle so that hX = 0. If iV is odd, the X values are taken as 
• • • —3, —2, —1, 0, -1-1, 4-2, -4-3 • ■ • in the usual fashion; if A" is even, 
they are taken as •••—2 5, —1.5, —.5, 4 -. 5, 4-1-5, -1-2.5 •••• The 
variables Xi, X 2 , X 3 • • • are derived from the moments of the X series. 
In form easy to use, these are: 

Xi = X. 


y _ y3 -7 

As - Xi - 20 -^1- 

y - y y - r^) y 

A(r+ 1) — A.iJLt — _ J) ^(r- 


!)• 


N is, as usual, the number of items in the series — ^the number of years or 
months — and r is the degree of the polynomial. Each of these equations 
is worked out, and in the computation table there will be column headings 
for Xi, X 2 , and A 3 . The constants A, B, C, and D wiU be obtained as 
follows: 


A = 


SF 

N ' 


B = 
C = 
D = 

Coefficient of Ar = 


12 


N{m - 1) 
180 


SAiF. 


N(N^ - Dim - 4) 
2800 


SA2F. 


2 A 3 F. 


Nim - 1)(A2 - 4)(A2 - 9) 

( 2 r)! ( 2 r 4 - 1 )! 

(r!)W(A’2 - Dim - D ■ ■ ■ im - r2) 


SA,F. 


In obtaining the trend values, the constants are multiplied by Ai, A 2 , 
and As instead of A, A^, and A®. For the theory of orthogonal polyno- 
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mials the reader is referred to R. A. Fisher, Statistical Methods for Research 
Workers (7th Edition), pp. 148“155. Fisher also explains a short cut 
method of fitting, which consists almost entirely of successive additions. 

Use of Logarithms 

Straight line equation. It is quite apparent from inspection of Chart 
165 that the curved trend line fits the Japanese industrial production data 
much better than would a straight line. It is not always necessary, how- 
ever, to fit an equation with three constants in order to obtain a curve 
with one upward bend in it. The exponential curve, Fc = which 
represents a trend with constant rate of increase, is such a curve. In 
logarithmic form it becomes log Yc — log a + X log b. Therefore, it is 
usually advisable first to plot the data on semi-logarithmic paper (that is, 
with logarithmic vertical scale) to see if the trend seems to straighten out.^ 
This is done in section B of this chart, and the straight line appears to 
be a reasonable fit, indicating a constant percentage rate of growth, as 
contrasted with a curve of the type Yc — a + bX + cX^, which indicates 
a type of growth that is increasing absolutely but declining relatively. 
In order to fit a line that is straight on semi-logaritlimic paper, it is neces- 
sary only to fit a straight line to the logarithms of the Y values. Thus in 
Table 97, column 3, are recorded the logarithms of the production index. 
From this point on, the procedure for fitting the curve is the same as for 
any straight line. The normal equations for log Fc = log a + X log 6. 
with origin at the mean of the X values (middle y^ar), are: 

I. S log F = AT log a. 

II. SZ log F = log 6 SZ2. 

The solution of these equations giv^ the following values : 

log a = 2.0158528, 
log 6 = .0264071, 

and the trend equation is 

log Yc = 2.015853 + .026407Z, 

^ If a series does not become a straight line when plotted on semi-logarithmic pa])eu 
it is sometimes possible to accomplish this result by subtracting, algebraically, a correction 
factor from each observation. After the trend is fitted, the correction factor is added, 
algebraically. In order to obtain the correction factor, we first divide the data into 
three equal groups of years and compute partial totals (Si F; 227; SaF) for each sec- 
tion The correction factor is 

> _ 1 r (2iY)(ZsY) - (Z2F)n 

In this expression n is the number of observations in a group. Compare with Frederick 
G. Mills, Statistical Methods, pp. 667-671, Henry Holt and Company, New York, 1938 
(Revised Edition), 
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Chart 165. Japanese Indnstrial Production and Straight Line Trend Fitted to Log*, 
arithms, 1919-1935 : A. Arithmetic Vertical Scale ; B. Logarithmic Vertical Scale. (Data 
of Table 97.) 
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with origin at 1927, and X units of one year. In column 6 are recorded 
the log Yc values. In order to obtain the arithmetic trend values corre- 
sponding to the original data, it is necessary only to look up the anti- 
logarithms of the log Fc values. The trend line is plotted in both sections 
of Chart 165. 

Since we multiply numbers by adding their logarithms and raise them to 

TABLE 97 

Computation of Stkaight Line Trend to Logarithms of Japanese Industrial 


Production, 1919-1935 



Production 
('1926 — 100'^ 

loff Y 

X 

Y Ino- V 

Computation of 
trend values 

(1) 

Y 

(2) 

(3) 

(4) 

(5) ^ 

log Yc 
(6) 

Yc 

(7) 

1919 

72 51 

1 860398 

-8 

-14 883184 

1 804597 

63.77 

1920 

65 07 

1 813381 

-7 

-12 693667 

1 831004 

67.76 

1921 

, 63 03 

1 799547 

-6 

-10 797282 1 

1857411 

72 01 

1922 

73 07 

1 863739 

-5 

- 9 318695 

1 883818 

76.53 

1923 

80 61 

1 906389 

-4 

- 7 625556 

1.910225 

81.33 

1924 

' 88,54 

1.947140 

-3 

- 5 841420 

1 936632 

86.42 

1925 

98 33 

1 992686 

-2 

- 3 985372 

1.963039 

91.84 

1926 

100.00 

2 000000 

-1 

- 2.000000 

1 989446 

97.60 

1927 

106 40 

2.026942 

0 

0 

2 015853 

103.72 

1928 

113.20 

2 053846 

1 

2 053846 

2 042260 

110.22 

1929 

128 70 

2 109579 

2 

4.219158 

2 068667 

117 13 

1930 

120.70 

2 081707 

3 

6 245121 

2 095074 

124.47 

1931 

116 00 

2 064458 

4 

8 257832 

2.121481 

132.28 

1932 

124 5 

2 095169 

5 

10.475845 

2.147888 

140.57 

1933 

149 6 

2 174932 

6 

1 13.049592 

2.174295 

149 38 

1934 

165 5 

2.218798 

7 

15.531586 

2.200702 

158.75 

1935 

182 3 

2 260787 

8 

18.086296 

2,227109 

168.62 

Total 


34 269498 


10.774100 

... 



Soiarce Standard Statistics Company, Inc , Standard Trade and Securities, Basic Statistics, Vol SO, 
June 0 . 1936, Other countries, p 1-19. 


Logaeithmic Foem Natokal Fobm 

I. 34 269498 = 17 log a 

log a = 2 0158528. a = 103.718. 

II. 10 774100 = 408 log 6 

log 6 = .0264071. h = 1.06269. 

Trend equation: Trend equation: 

log Yc = 2 015853 + .026407X Ye = 103 718(1.00269)^ 

Origin, 1927; X units, one year. Origin, 1927; X units, one year. 

a power by multiplying the logarithm of the number by that power, we 
may change 


to 


log 7c == log o H- A log b 
Yc = oF. 
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Below Table 97 it is shown that a « 103.718 and b = 1.06269. 
the trend equation 

log Yc = 2.015853 + .026407X 

may be written 


Yc = 103.718 (1.06269)-^. 


Therefore, 


The advantage of the equation in this form is that it shows 103.718 to be 
the trend for 1927, and that Japanese industrial production has a normal 
annual growth of 6.269 per cent. Incidentally it might be noted that 
103.718 is the geometric mean of the series. 


MILLIONS 
OF POUNDS 



1919 1921 f925 1927 1929 193! 1933 !935 1937 

Chart 166. United States Rayon Consumption and Second-Degree Curve Fitted to 
Logarithms, 1919-1936, and Trend Extension Through 1937. (Data of Tables 80 
and 98.) 


Since the geometric mean is always a little smaller than the arithmetic 
mean, and since the sum of the squares of the deviations of the logarithms 
(rather than the original data) is at a minimum for this trend, the sum 
of the deviations above the trend line of Chart 165A is slightly greater 
than the sum of those below it. This is possibly a slight objection to this 
type of trend. On the other hand, the deviations on either side of the 
line in section B do cancel. Furthermore, use of logarithms equalizes the 
importance of the cycles in the e^rly yearn, when the data are of small 
absolute size, with those of the later years, when the cycles are larger 



TABLE 98 

COMPXJTATION OF SECOND DeGBEE TbEND TO LoGABITHMS OF UNITED STATES RaYON CONSUMPTION, 1919-1936 

(Thousanda of pounds) 
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in absolute size. By so doing, the trend line is more likely to go through 
all the cycles than merely the more recent ones. Probably this point 
more than offsets the other rather technical disadvantage. 

Second degree curve. Although the rayon consumption data are con- 
cave upward when plotted on arithmetic paper, Chart 166 reveals that 
the use of ratio paper more than straightens out the curve, making it 
concave downward. It is therefore apparent that, if logarithms of the 
consumption figures are used, a second degree semi-logarithmic curve in 
which c is negative might give a reasonably good fit. The equation type is 

log Fc = log a + X log h + log c. 

Or, for the sake of convenience, it may be written simply 
log Yc ^ a + hX + cX^ 

where a, 6, and c represent logarithms. 

The normal equations required, when the origin is at the middle year, 
are 

I. S log F = Aa + 

II. SY log Y = 6SX2. 

III. SXMog F - aSX^ + 

All computations, including the Yc values, are given in Table 98. The 
method of obtaining this type of trend will be apparent from inspection 
of this table. The trend values are plotted in Chart 166. The trend 
line seems to fit the data reasonably well, except that it lies below the data 
from 1921 through 1925. Also, this trend would eventually turn down- 
ward, which is entirely illogical. For these reasons the second degree 
curve to the natural numbers is probably a better trend. 

Curves with Declining Absolute Growth 

It is a characteristic of many series that, although the direction of growth 
remains positive, the increment of growth declines with the passage of 
time. A few curve types will be mentioned which fulfill the above require- 
ment. 

(1) Modified pol 3 moinials: Fc = a + bXK This may be expanded 
as follows to include additional constants: Fc = a + bX^ + cX + • • * . 
Of course, some of the constants may be negative, in which case the curve 
may ultimately turn down. 

(2) Straight line to log X: Yc = a + h log X. It is difficult to find 
any logical justification for using logarithms of time, but occasionally such 
a formula gives a close fit. 

(3) Parabolic curve to log F: log Fc = aX^* In order to fit this 
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curve, the formula should be stated in the following form: log log = 
log a + 6 log X 

(4) Modified exponential: Yq = & + This curve describes a series 
the absolute growth of which decreases by a constant proportion when a 
is negative and b is less than one.^ This curve and modifications ryf it result- 
ing from use of the logarithms (Gompertz curve) or reciprocals (logistic 
curve) of the Y values will be discussed m detail in the next section. 

Asymptotic Growth Curves 

Modified exponential. Not only do many series gradually taper off, 
but it is often true that they approach an upper limit or as 5 unptote. Per- 
haps the simplest type is one in which the amount of growth declines by a 


TABLE 99 

Hypothetical Data for Modified Exponential Curve 
(Asymptote k = 114) 


X 

(1) 

Y 

(2) 

I 

Partial 

totals 

(3) 

Y 1 

increment 

(4) 

Per cent of 
preceding 
increment 
(5j 

0 

50 




1 

66 

116.0000 

10 i 


2 

78 


12 

75 

3 

87 

165.0000 

9 

75 

4 

93 75 


6 75 

75 

5 

98 8125 

192.5625 

5 0625 

75 


constant percentage. Now, the ordinary exponential (or compound in- 
terest) curve, written Fc = ab^, describes a curve the amount of change 
in which increases by a constant percentage if 6 is a positive number 
greater than one, but declines by a constant percentage if 6 is a positive 
number less than one. Furthermore, if the growth is declining by a con- 
stant percentage, the amount of growth approaches zero as a limit; if 
now we add another constant to the equation so that it reads Yc - k + 
ab^f the curve will approach A: as a lower limit if a is positive, but h will 
be the upper limit if a is negative. 

A series the amount of growth of which is decreasing by a constant per- 


^ If the absolute growth decreases by a constant proportion, then the values of the 
senes increase at a decreasing percentage rate. ^Another curve which increases at. a 

decreasing percenta.ge rate is the type Yc - I'urther reference to this curve 

will be made in Chapter XXI. See p. 635. 
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centage is shown in the first two columns of Table 99. As can be seen 
in columns 4 and 5, each first difference is 75 per cent of the preceding 
first difference. The increments of increase are Ai, A 2 , A 3 , A 4 , and Asj 
and 

A^2 _ A3 _ A4 _ A5 _ 

Ai A2 As A4 

Referring to Chart 167^, the horizontal broken line near the top of the 
chart is the value k that the curve of this series approaches; in this case k 
-is 114 This means that, if we should extend the trend line indefinitely 


Yc 



Chart 167. Artificial Data Conforming to Modified E3cponential Equation. 

it would approach closer and closer to this value, but never quite equal it. 
The second constant, a, the value obtained by*subtracting the asymptote 
k from the trend value when X zero, in this instance is --64. The 
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third constant, 6, is, of course, the ratio between successive increments of 
growth, or .75 for this series. In Chart 167 the vertical broken line when 
Z = 1 is — 64 (.75) = —48; whe n Z = 2 it is —64 ( .75)^ = “^36; and so 
on for the other values of Z. Thus these verti^ broken lines are de- 
scribed by the expression db^. This is true when Z = 0 also, since 
— 64 (.75)^ = —64. In the diagram ab^ is represented by the height of 
the shaded area. If now, in turn, we subtract from k the value of each 
of the vertical broken lines, we have the trend values represented by large 
dots on the chart. The vertical broken lines are subtracted from k because 
the sign of a is negative. Thus: 


X 

k H- ab^ 

= Yc 

0 

114 - 64 

« 50 

1 

114 - 48 

« 66 

2 

114 - 36 

* 78 

3 

114 - 27 

= 87 

4 

114 - 20 25 

- 93.75 

5 

114-15 1875 

= 98 8125 


It is therefore evident that the equation is of the type Yc = k 4- ab^. 
The sign of a is always negative if the increments of growth are declining. 
As is already obvious, for this series of data the equation is Fc = 114 — 
64(.75)^. 

Since this curve has three constants — k, the asymptote; a, the distance 
between the value of Fc when Z == 0 and the asymptote; and 6, the ratio 
between successive first differences — three equations are required for its 
fitting. These are obtained by first dividing the data into three equal 
sections, as in Table 99. Then the Y values are totaled for each section, 
as in column 3. The results are: 

SiF = 116 . 

SsF = 165. 

SsF = 192.5625. 

Let us note what 116 represents in terms of our equation. It is the sum 
of 50 + 66. But 50 is & + ab^ and 66 is X; + a6^; so 

116 — 2k + a + ab. 

This is equation I. The other two are obtained in similar fashion. Th$ 
three equations are: 

I. 116 == 2^ + a + ab. 

II. 165 = 2k + ab^ + obK 

III. 192.5625 = 2fc + -f ab^ 

In order to solve for 5, we first subtract equation I from equation II, ob 
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taining equation A; and then equation 11 from equation III, obtaining 
equation B. Thus: 

A. 49 = db^ + ah^ — ab — a 

= o(63 + - b -1). 

B. 27.5625 = ab^ + ab^ - ab^ - ah'^ 

= a62(63 + 52 _ 5 _ i)_ 


The constant b is now obtained by dividing equation B by equation A 
We shall call the resulting equation C. 

27.5625 _ ah\¥ + b^ -b-1) 

49 a(63 + 52 _ 6 _ 1 ) 


52 = .5625 


The value of a may now be obtained by substituting in equation A or B. 


A. 49 = a(.753 + .75^ - .75 - 1) 

The remaining constant k may be obtained by substituting the values of 
a and h in any of the original equations. 


L 116 = 2k - 64:- 64(.75) 

2k = 228 
k = 114. 

The values of the constants are thus found to be those which we knew to 
be correct. The equation was not obtained by the method of least squares, 
but was so fitted that the three partial totals of the trend values were the 
same as those of the original data. In this case, since the original data 
conform to the equation type perfectly, the fitted curve passes through 
all of the original points. 

The logical procedure, which has been explained, can be developed into 
more convenient formulae, which are as follows:^ 

^ 23F - SsF 

^ ^ S 2 F - 2iF* 

0= (S2F- 

where n is the number of years in each third of the data. 


^ The derivation of these formulae is given in Appendix B. section XVI-1. 



Chap. 16 ] 


ASYMPTOTIC GROWTH CURVES 


445 


Solving by these formulae requires, of course, that 6 be obtained first, 
then a, and finally k. 

The value of k may also be obtained by use of the following expressions: 




h 

1 




which does not involve the determination of a, and 


n 


’ (SiF)(i:3F) - (22F)^ 

L21F + 23F-2S2F J 


which does not involve the determination of either a oih. These expres- 
sions for h may be derived by substituting the expressions for a and h in 
the equation 



Although we have used the equation Fc = A; + ah^ to express the trend 
of a series the amount of increase in which is decreasing at a constant rate, 
this equation type may be used for series the amount of increase, or de- 
crease, in which tends to decrease, or increase, at a constant rate. In any 
event k will always be the asymptote at X = =i= oo. 

Modified exponential fitted to department store sales. One of the diffi- 
culties which the world depression of the thirties occasioned for statisticians 
was the question of the sort of trend to use during this prolonged economic 
breakdown. Many trends that seemed appropriate before 1931 became 
rat&er absurd if the 1932 and 1933 data were included; for instance, this 
one cycle might cause the trend to turn downward. One solution that 
was commonly used was to refrain from revising the trend, but to extend 
the trend that had been fitted to data prior to 1931. Many people be- 
lieved, however, that the extension of the old trend would tend to exagger- 
ate the depression by making the trend too high. This is but another 
way of saying that the trend after the depression was expected to differ 
from the old trend, as to either its level or its slope. One conservative 
practice, therefore, was to select a trend type that would tend to flatten 
out without actually bending down. The modified exponential is well 
suited to do this for some series. As an illustration of method, department 
store sales will be used. The trend has been fitted to the 1919-1930 data 
and extended through 1936. The original data and the trend are plotted 
on Chart 168. The trend equation is 

Yc = 110.270 - 32.7894 (.7814348)^. 

The asymptote of this curve is therefore 110.27 per cent. Although it is 
not likely that department store sales will always remain below this figure 
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TABLE 100 

Computation’ of Modified Exponential Trei'Td to Federal Eesebve Index of 

Depaetment Stoke Sales 

(1923-1925 = 100) 


Year 

X 

Y 

(sales index) 

; Computation of trend values 


ab^ 

Yc — k + ab^ 

1919 

0 

78 

1 0000000 

-32 789 

77 48 

1920 

1 

94 

7814348 

-25 623 

84 65 

1921 

2 

87 

6106403 

-20 023 

90 25 

1922 

3 

88 

477] 756 

-15 646 

94 62 

SiF 


347 



347 OOx/ 

1923 

4 

98 

3728816\/ 

-12 227 

98 04 

1924 

5 

99 

2913827 

- 9 554 

100 72 

1925 

6 

103 

2276966 

- 7 466 

102 80 

1926 

7 

106 

1779300 

- 5 834 

104 44 

S2F 


406 



406 OOv^ 

1927 

8 

107 i 

.1390407 

- 4 559 

105 71 

1928 

9 

108 

1086512 

- 3 563 

106 71 

1929 

10 

111 

0849038 

- 2 784 

107 49 

1930 

11 

102 

0663468 

- 2 175 

108 10 

SsF 


428 



428 0 V 

1931 

12 


0518457 

- 1700 

108 57* 

1932 

13 


0405140 

- 1328 

108 94* 

1933 

14 


0316590 

- 1038 

109 23* 

1934 

15 


0247394 

- 0 811 

109 46* 

1935 

16 


0193322 

- 634 

109 64* 

1936 

17 


0151069 

- 495 

109 78* 


* Trend values extended beyond data 

Source Data for 1919-1935 from United States Department of of Current Busmens 

1936 Supplement, p 27, data for 1936 from Standard Trade and ,, , < ,C ' Statzsiics, April 1937 
p 13. 


4 log h 
log b 
h 


: ^ 


SgF 428 - 406 22 


SzF - Si 7 406 - 347 59 

= log .3728814 = 9 571570 - 10 - -0 428430 
-.1071075 = 9.8928925 - 10 
- 7814348 


- .3728814. 


a = (SaF - SiF) = 59 [ 


7814348 


( 3728814 




59 


r (- 2186652) -] 
L(- 6271186)2j 


(& 

= -59 [ 3932^1 ] = -59(5567528) = -32 7894152. 

‘ = i - rf^>] - i [» - 5|ii 


= |[347 - (2 8692618)(-32.789415)] 

= I (347 + 94.0810880) = -= 110.270. 

Yc = 110 270 - 32.7894152 (.7814348)^, 

With origin at 1919 and X units 1 year. 
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nevertheless it is not an unreasonable equation to use until business regains 
sufficient stability to permit a more appropriate trend to be selected. 

The computation of this trend is given in Table 100. The constants 
are obtained by means of the formulae on page 444. The last three col- 
umns of the table are for obtaining the trend values. Column is ob- 
tained by placing the value of h in the keyboard of the calculating naachine 
and multipljdng it by the value of last obtained. It is advisable to 
obtain column as scon as the value of b is obtained. If no mistakes 
have been made, the value of in the table will agree with that obtained 

PER CENT 



Chart 168. Modified Exponential Trend Fitted to Federal Reserve Index of Be- 
partment Store Sales, 1919-1930, and Trend Extension Through 1937. (Data of Table 
100 and United States Department of Commerce, Survey of Current Business, March 
1938, p. 27.) 

by the formula. Column afy^ is obtained by putting the value of a in 
the keyboard and multiplying by the appropriate b^ values. A final check 
on the accuracy of the work is obtained by summing each of the three 
sets of Yc values. (Note check marks in Table lOQ.) This is equivalent 
to verifying by substituting the value of the constants in the normal 
equations. 

Gompertz curve. More commonly used than the modified exponential 
just described is the Gompertz curve. While the modified exponential 
has to do with a series the growth increments of which are declining by a 
constant percentage, the Gompertz curve describes a series in which the 
growth increments of the logarithms are declining by a constant percentage 
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The natural values of the series described by the Gompertz curve show a 
declining rate of growth, but the rate does not decline by either a constant 
amount or a constant percentage. A second degree curve may also de- 
scribe a series whose rate of increase is decreasing, but such a curve does 
not flatten out at the top. Furthermore, while the second degree curve 
is not as 3 raiptotic, and the modified exponential has only an upper limit, 
the Gompertz curve is asymptotic at both ends, the lower asymptote being 
zero. 

A Gompertz curve fitted to the rayon data from 1919 through 1936 and 
extended in both directions to include 1915 and 1950 is shown in Chart 
169. The extensions are for the purpose of illustrating the shape of the 
curve; no prediction is intended. It will be noticed that the amount of 
growth is small at first, then becomes larger until it reaches a point of 
inflection, after which it declines and finally approaches, but never reaches, 
zero. This general shape of the trend is conamon to many industries and 
has led Prescott® to the conclusion that it describes a law of growth. Ac- 
cording to Prescott, this trend is a function of population growth, the 
curve of which typically is similar in appearance, but it is also partly due 
to the development of the individual industry. He believes that the 
growth of an industry may be divided into four stages: 

(1) Period of experimentation. 

(2) Period of growth into the social fabric. 

(3) Through the point where growth increases but at a diminishing rate. 

(4) Period of stability. 

Although these stages are not very specifically demarcated by Prescott, 
apparently rayon consumption is now in the third stage, for it is not until 
1935 that the increment of trend growth begins to faU off. Prescott also 
claims for this type of curve that it is useful in forecasting the future of 
an industry, since it is not only a logical curve but, on account of its tend- 
ency to flatten out, it tends to be conservative in its forecasts. The hori- 
zontal dashed line of section A of Chart 169 indicates that the upper limit 
of United States rayon consumption will eventually be about 520,000,000 
pounds per year. It seems quite likely, however, that the Gompertz curve 
is unduly conservative in this instance. 

The same data and trend are shown on ratio paper in section B of Chart 
169. On this type of paper the resemblance to the modified exponential 
curve (drawn on arithmetic paper) is apparent. In lact the Gompertz 
curve is exactly like the modified exponential except that it is the incre- 
ments of increase in the logarithms of the Y values that are declining at 

®''Law of Growth in Forecasting Demand,” by Eaymond D. Prescott, Journal of 
the American Statistical Association^ Vol. XVIII, December 1922, pp. 471-479. 



Chart 169. Gompertz Ctirve Fitted to United States Bayou Consmnption, 1919-1936: 
A. Arithmetic Vertical Scale; B. Logarithmic Vertical Scale. (Trend is extended to 
1915 and to 1960 to show general shape of curve. Data of Tables SO and 101.) 
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a constant, rate. Consequently the equation type’^ may be written 
log Yc = log + (log 

and the constants are obtained by the following formulae: 

_ Salog 7 - Dz log r 
Salog 7 - Si log Y 

log a = (S 2 log 7 — Si log 7) 

log A = 1 Si log 7 - log 

1/^ S 2 log 7 - Si log 7\ 

= -(^Si log 7 j> 

= 1 r (2i log 7)(S3 log 7) - (S 2 log 7)^ ~ 
u _ 'Ell log Y -|~ Es log Y — 2E2 log Y __ 

Since these formulae exactly parallel those of the modified exponential 
curve, no explanation of their use is necessary. However, if the reader 
will refer to Table 101, he will find there all of the important computations 
necessary to obtain the trend equation for rayon consumption. This equa- 
tion is 

log Yc = 5.716174 -- 1.8259494 (.9002623)^ 

with origin at 1919 and X units of one year. 

This equation type may also be written in the form: 

Yc = 

In order to put the rayon equation in this form, it is necessary to look up 
the anti-logs of the constants k and a, which are in logarithmic form. 
Since 5.716174 is the log of 520,205 and - 1.8259494, or 8.1740506 - 10, 
is the log of .0149297, the equation becomes 

Yc = 520,205(.0149297) 

Since h is .9002623, increments of growth in the logarithms of the trend 
values are each 90.02623 per cent of the preceding year. The value of b 
will always be less than one if the rate of growth of the series is declining. 
Since 6 — 1 will be negative under those circumstances, so will log a (see 
equation for log a); and a will be less than 1. Therefore the greater the 
value of X, the smaller becomes the value of 6^. As this value approaches 

^ While generally used for data the logarithms of which tend to increase by amounts 
which are decreasing at a constant rate, it may be apphed to data the logarithms of 
which tend to increase, or decrease, by amounts which are decreasing, or mcreasing, at 
h constant rate 



TABLE 101 

Computation op Gompertz Equation Fitted to United States Consumption op 

Rayon, 1919-1936 


(Thousands of pounds) 


Year 

X 

Con- 

sumption 

logy 

Computation of trend values 


(log a) 

log Fc - 
log A; + ^ 
(log a) 

Yc 

1919 

0 

9,291 

3 968062 

1 0000000 

-1825949 

3 890225 

7,766 

1920 

1 

8,718 

3 940417 

9002623 

-1.643833 

4 072341 

llisio 

1921 

2 

19,751 

4 295589 

8104722 

-1 479881 

4 236293 

17;230 

1922 

3 

24,747 

4 393423 

7296376 

-1.332281 

4.383893 

24;200 

1923 

4 

32,558 

4 512657 

6568652 

-1.199403 

4 516771 

32,870 

1924 

5 

42,243 

4 625755 

5913510 

-1 079777 

4.636397 

43;290 

Si logy 



25.735903 



23 735920V' 


1925 

6 

58,277 

4 765497 

5323710 \/ 

- 972083 

4.744091 

i 55,470 

1926 

7 

60,630 

4 782688 

4792735 

- ,875129 1 

4 841045 

69,350 

1927 

8 i 

100,048 

5 000208 

4314719 

- 787846 

4 928328 

84;790 

1928 

9 i 

100,101 

5,000438 

! 3884379 

- 709268 

5 006906 

101,600 

1929 

10 

131,448 

5 118753 

3496960 

- 638527 

5.077647 

119,600 

1930 

11 

117,968 

5 071766 

3X48181 

- 574842 

5 141332 

138,500 

S 2 log 7 



29 739350 



29.739349./ 


1931 

12 

157,360 

5 196895 

2834189 

- 517509 

5.198665 

158,000 

1932 

13 

152,041 

5 181962 

2551514 

- 465894 

5 250280 

177,900 

1933 

14 

211,883 

5 326096 

2297032 

- 419426 

5 296748 

198,000 

1934 

15 

197,771 

5.289524 

,2067931 

- .377594 

5 338580 

218,100 

1935 

16 

252,676 

5.402564 

1861680 

- .339933 

5 376241 

237,800 

1936 

17 

297,594 

5 473624 

1676000 

- .306029 

5 410145 

257,100 

S3 log y 



31 870665 



31 870659 v/ 



Source See Table 80 . 


log 6® 
log h 
h 


^3 log y - S 2 log y 

S 2 log r - Sx log Y 

31 870665 - 29.739350 2 131315 

29.739350 - 25 735903 "" 4 003447 “ 

9 72621338 - 10 =- -.27378662 

- 045631103 - 9.954368897 - 10 

.9002623 


53236998. 


log a - (X 2 log 7 - Si log 7) 




.9002623 - 1 


• 1 
-IP 

4 003447 


- 0997377 


4.003447 ( 53236998 _ 1)2 - -- - (_ 46763002)® 

= -1.8259494, or 2.1740506, or 8.1740506 - 10. 
log fc = ^ [Si log y - log <t] 


n 
1 
6 

^ 5 716174. 


25.735903 


-.46763002 
• 0997377 


(- 1.8259494) J 


Trend equation; 

log Fc = 5 716174 - 1.8259494(.9002623)^ 

Yc = 520,205(.0149297)-®«°®®®»^' 

With ongm at 1919 and X units 1 year. 
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zero, the value of approaches 1, and Yc approaches k, or 520,205, 
which is the upper asymptote. On the other hand, when X is zero, a** 
has the same value as a, which is .149297, and Yc is 520,205 X .149297 
= 7,766.5. 


FIRST 

DIFFERENCE 



A FIRST DIFFERENCES OF A GOMPERT2 CURVE 


FIRST 

DIFFERENCE 



Chart 170. Growth Increments of a Gompertz Curve and of a Logistic Curve. 
(Gompertz curve increments are first differences of trend values of Table 101, while 
Logistic Curve Increments are first differences of trend values of Table 102. Trend 
values have been extrapolated to show better the shapes of the curves. In each case 
vertical distances represent changes from preceding year ) 


Logistic curve. Another type of growth curve which has the same gen- 
oral shape as the Gompertz curve is the logistic. It is, in fact, identical 
with the modified exponential except that it is the first differences of 
the reciprocals which are declining by a constant percentage. A modified 
exponential may therefore be fitted by the method of partial totals to the 
reciprocals of the 7 values, and the reciprocals of the fitted values so 
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obtained may be taken as the trend values. The equation type may 
therefore be 

= fc d” 

r c 

More often, however, this curve is fitted by the method of selected points. 
Such a process is more subjective, but is probably more generally used. 
When fitted by this method, the equation type is 

V = ^ 

In this equation k, which is one of the constants to be discovered, is the 
upper limit, and e is 2.71828, the base of the Naperian system of logarithms. 
Since h is negative, the value of a + must eventually become negative 
and a very small fraction, so that Yc approaches the value k (is 

asymptotic to k). Consequently k is the upper limit. On the other hand, 
for large negative values of X, the denominator will become very large 
and the trend value approach zero. 

Although the logistic is like the Gompertz in important respects, there 
is one dissimilarity that is easy to observe. The first differences of the 
logistic curve when plotted produce a symmetrical curve that closely re- 
sembles a normal frequency distribution, while those of the Gompertz 
curve are skewed. Chart 170, on which are plotted (in part A) the first 
differences of the rayon data Gompertz trend and (in part B) those of the 
logistic trend of United States population, brings out this point very clearly. 

The logistic curve is fitted by a method that makes the curve pass 
through three subjectively selected points equidistant from each other: 
one near the beginning of the period, one in the middle, and one near the 
end. The computation of the trend values for United States population 
growth is shown in Table 102. The three selected years are 1800 (xq), 
1860 (a;i), and 1920 (x 2 )^ The y values chosen are the geometric three- 
decade averages centering on these periods. Averages were used in order 
to eliminate abnormal values. The geometric mean was used in prefer- 
ence to the arithmetic mean since the growth is more nearly straight Hne 
geometrically than arithmetically. There is no certainty, however, that 
this method is an improvement over the selection of values entirely from 
inspection of the chart. The averages obtained are as follows: 

yo = 5,325. 

yi = 30,408. 

2/2 = 106,079. 

Designating by n the number of years from xo to Xi, or from x\ to the 



TABLE 102 

LoaiSTic CuEVE Fitted to XJnited States Population, 1790-1930 
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Source" United States Department of Commerce, Stahstvxd Abstract of the United Slates, 1937, p 2 

■* Each y value is the geometric meap of three values in column 4 Thus; 

yo = (3,929 X 5,308 X 7,240)i = 5,325. 
yi = (23,192 X 31,443 X 38,558)1 = 30,408 
2/2 = (91,972 X 105,711 X 122,775)1 = 106,079. 
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formulae required for computing the constants for the logistic curve are:® 

^ ^ ^yoym - yf (yo + 2 / 2 ) 

2 / 02/2 - yl 
1 k — yo 


n ° yi{k - yo) 

Substituting in the first equation, we have: 

2(5,325) (30,408) (106,079) - (30,408)^ (5,325 + 106,079) 
5,325(106,079) - (30,408)^ 

= 190,830.35. 

The second equation becomes 

, 190,830.35 - 5,325 , 185,505.35 , ,, 

~; 325 5 ; 325 ~° log. 34.836685. 

However, 


Therefore 


log. X = 2.302585 logic X. 


log. 34.836685 = 2.302585 logic 34.836685, 


a = 2.302585 log 34.836685 = 3.5506713. 

EinaUy, substituting in the last equation 

1, 5,325(190,830.35 - 30,408) 1, 

’’ - 6 30,408(185,505.349) 6 

.3145946. 

The trend equation therefore may be stated 
y ^ 190,830.35 

1 + 6^560671 - 3146946r^ 

with origin at 1800 and X units 10 years. 

In obtaining the trend values, it is possible to save one column of mul- 
tiplications by simplifying the formula. If we designate by fM the expression 
6® + the formula becomes 


8 For the mathematical reasoning behind this type of curve, see Raymond Pearl, 
Stvdies in Human Biology^ Chapter XXtV. Wdhams and Wilkins Company, Baltimore, 
1924 . 
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In our equation 

Ij^ ^ qZ 550671 - 3145945X^ 

and 

logio ju - (3.550671 .3145945X) logio ^ 

= .43429(3.550671 - .3145945X) 
= 1.542035 .1366265Z. 


In Table 102 the values for ji are computed in columns 6, 7, and 8. A 
final check on the computations may be had by comparing the Yc values 
for xoj xi, and X 2 with the values of yi, ^ 2 , ysj since the curve must pass 
through the three selected points. The check marks in column 10 of the 
table indicate perfect agreement. 

The results of the curve fitting are shown in Chart 171. Since the 
method of fitting is based on selected points, the fitted cuiwe would of 
necessity coincide with the values selected for those points. The chart, 
however, shows extremely close relationship throughout. The trend has 
been extended a number of decades in order to show more completely the 
fundamental shape of the curve. 

The logistic curve owes its name to a Belgian mathematician bj^ the 
name of Verhulst, who used it as early as 1838 as an expression of the law 
of population growth, and gave it that name. In recent years it has been 
used extensively by the biometricians Raymond Pearl and L. J. Reed, 
and is frequently called the PearhReed curve. They have used it to de- 
scribe the growth of an albino rat, a tadpole^s tail, the number of yeast 
cells in a nutritive solution, the number of fruit flies in a bottle (on a 
limited food supply), and, most interesting of all, the number of human 
beings in a geographical area. In each case the phenomenon measured 
is population growth, either the number of cells in an organism or the 
number of individuals in a region. The law of growth which the logistic 
curve describes is stated by Pearl as follows:^ 


In a spatially limited universe the amount of increase which occurs in 
any particular unit of time, at any point of the single cycle of growth, is 
proportional to two things, viz: (a) the absolute size already attained at 
the beginning of the unit interval under consideration, and (b) the 
amoxmt still unused or unexpended in the given universe (or area) of 
actual and potential resources for the support of growth. 


In the case of human populations some development may expand the 
available subsistence and allow a new cycle of growth. For instance, 
mankind may pass through a hunting stage, an agricultural stage, and 


^ Ilaymond Pearl, Tlie Biology of Popidation Growth, Alfred A. Knopf. New York, 
1925, p. 22. 
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an industrial stage. Each cultural epoch may then be des'^ribed by a new 
logistic curve spliced onto the old one. Thus 


Yc = ki + 


^2 

1 -j. ga + bX 
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Chart 171. Logistic Lrend Fitted to Population Growth, Continental United States. 
1790-1930. (Dotted lines 1770“1790 and 1930-2050 indicate extensions beyond data 
to show general shape of curve. Data of yabie 102.'i 




458 


ASYMPTOTIC GROWTH CURVES 


[Chap. 16 


describes a curve in which ki is the new lower limit and ki + k 2 the new 
upper limit. In this equation, ki is below the upper limit fco of the previ- 
ous logistic and indicates the value at which the previous one was inter- 
rupted. 

Apparently waves of immigration and human institutions do not change 
the fundamental shape of the curve, although the steepness of its slope 
may be modified somewhat. Also the growth may not be symmetrical: 
the point of inflection need not be halfway between the upper and the 
lower as 3 unptotes, nor need the two parts of the curve be of the same 
shape. A skewed logistic may be obtained by a slight modification of the 
pmvious formulae: 

Fc ~ 6X + cX2* 

The theory advanced by Raymond Pearl is not, however, universally 
accepted. Some argue that, although the logistic curve is appropriate 
enough for fruit flies in a bottle, its extension to human society is unwar- 
ranted. Human beings have, and exercise, the power of modifying their 
environment and rationally controlling their rate of reproduction. 

One of the chief objects of the logistic curve is the forecasting of future 
growth. Thus Table 102 shows the trend values extended through 1950. 
But it may also be used to estimate the population for earlier periods, 
before the existence of reliable records. Thus the population of the region 
later to become the United States is estimated in Table 102 to have been 
2,876,000 in 1780. Of course, extensions of the curve give reliable results 
only if there are no changes in the area involved and no new influences 
arise to affect the rate of population growth; and the accuracy varies in- 
versely with the extent of extrapolation. 

Use of ari thm etic probability paper. Attention has already been called 
to the close resemblance between the first differences of the curves de- 
scribed in this section and the ordinary frequency polygon. Chart 170 
could easily be mistaken for two such curves. No doubt also the reader 
has noticed that the trend values when plotted on ordinary arithmetic 
paper are very similar to ogives, or cumulative frequency polygons. Since 
this is true, it might well be that some series, if plotted on arithmetic 
probability paper, would approximate a straight line. Arithmetic proba- 
bility paper, however, is in terms of percentages, and either the scale of 
the probability paper must be converted into the units of the data being 
used, or the data must be converted into percentages. The latter is easier. 
Since 100 per cent is the theoretical upper limit of probability paper, it is 
necessary only to assume some upper limit for the time series in question 
and to express the value for each year as a percentage of this maximum. 
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Using the population data as an illustration, let us assume that 
150,000,000 is the maximum figure for our population. Dividing this 
number by each of our decennial population counts gives the percentages of 
Table 103, column 3. These percentages are plotted on Chart 172. The 
result is a curve that is concave upward. The curve can be straightened 
out, however, by taking a larger upper limit. Thus upper limits, in turn, 

PER CENT 



Chart 172. Estimating the Trend of United States Population hy Use of Arithmetic 
Probabihty Paper, (Trial values of upper limit are shown to the right in the chart. 
Data of Table 103.) 

of 150, 200, 250, 300, 400, 500, and 1,000 millions fehown to the right in 
this chart) produce the different curves of Chart 172. The curves gradu- 
ally becomes straighter as the one with 400 millions as a limit is ap- 
proached, after which they begin to be slightly concave downward. Since 
the 400 line seems to be most nearly straight, a straight line is drawn 
through the points on this line and, in order to estimate future population, 
the line is extended through 1950. Readings from this line are recorded 
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in column 5 of Table 103. The trend values are now obtained simply by 
multiplying these decimals by 400,000,000. 

Although the trend values are not greatly d*£ferent from those obtained 
by the logistic curve until the curve is extended far beyond the data, the 
upper limit of population growth by the graphic method is more than 
double that obtained by the mathematical equation. Most experts on 
population growth would probably say that 190,000,000 is closer to the 
truth than 400,000,000. 


TABLE 103 

Estimating tee Tkend of United States Population fkom Probability Paper 


Year 

(1) 

Popiilation 

(2) 

Population 
as per cent 

1 of 

150,000,000 
; (3) 

Population 
as per cent 
of 

400,000,000 

(4) 

1 

Trend 

readings 

from 

Chart 172 
(per cent) 

! (5) 

Yc 

[400, OOO,*)^ X 
column H] 

(6) 

1790 

3,929,000 

! 2 61 

0 98 

.90 

3,600,000 

1800 

5,308,000 

3.54 1 

133 

135 

6,400,000 

1810 

7,240,000 

4.83 

181 

180 

7,200,000 

1820 

9,633,000 

6 43 

2 41 

2 60 

10,000,000 

1830 

12,866,000 

8.58 i 

3 22 

3 35 

13,400,000 

1840 

17,069,000 

11.38 

4.27 

4.55 

18,200,000 

1850 

23,192,000 

15.46 

5 80 

5 95 

23,800,000 

1860 

31,443,000 ! 

20 96 

7 86 

7.70 

30,800,000 

1870 

38,558,000 1 

25.71 

9 64 

9.80 

39,200,000 

1880 

50,156,000 1 

33.44 

12 54 

12.45 

49,800,000 

1890 

62,948,000 

41 97 

15 74 

15.35 

61,400,000 

1900 

75,995,000 

50 66 

19 00 i 

18 80 

75,200,000 

1910 

91,972,000 

61 31 

22 99 

22 40 

89,600^000 

1920 

105,711,000 

7047 

26.43 

27.00 

108,000,000 

1930 

122,775,000 

81.85 

30.69 

31 40 

125,600,000 

1940 


, . 


36.15 

1 144,600,000 

1950 




41 50 

1 166,000,000 

1 


Source: See Table 102 


Thus Pascal K. Whelpton, a member of the American delegation at the 
Population Congress held in Paris in July 1937, is quoted as saying in an 
interview 

United States population experts seem to agree that the United States 
will reach the climax of its population in the next generation. Perhaps 
the top figure will be 150,000,000. After that the population is bound to 
decrease. That it is already shrinking is evidenced by the fact that there 
are fewer children today in the first five grades of the public schools. 


As reported by the New York Times ^ July 31. 1937. 
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The United States population has increased only 9,000,000 from 1930, 
owing to the practice of birth control and lessened immigration. Most 
of us agree it is a good thing, leading to a higher standard of living. 

Expert opinion also leans toward methods of population prediction 
which put less reliance on extending curves and more on analysis of the 
causal factors where these are known. The chief of these factors are: 

1. Births. The number of births in a country bears a relationship, not 
to the number of persons in the country, but to the number of women of 
child-bearing age. Thus the age and sex distribution of a country need 
be considered, as well as any trend in this ratio. 

2. Deaths. 

3 Immigration. 

i . Emigration. 

By consideruig such factors as these, including their estimated trends m 
the future, most statisticians obtain a maximum well under 200,000,000. 
It is also considered possible that the population of the United States may 
decrease before the year 2,000, since the women of child-bearing age are 
not producing enough offspring to maintain themselves. 

The factors mentioned above can be utilized for the United States as a 
whole, but the last two factors, immigration and emierration, are not avail- 
able for individual states and cities. In these uses, interpolation and 
extrapolation must be made by means of a trend equation or by reading 
from a curve. 

Objective Tests of Trends 

It must not be imagined that this chapter is at all exhaustive of the 
types of trends that may be utilized. However, a sufficient variety has 
been given to meet most of the needs for time series analysis. Since such 
a large number are available, how can we decide which to use? First of 
all, let it be repeated that we should select a trend which describes the 
forces sought to be measured. If the object is solely to obtain cyclical 
deviations, probably the trend should pass through the approximate center 
of each cycle. If the object is forecasting, a mathematical equation should 
be selected which, when extended, will conform to expectations dictated 
by logic. If, for instance, the series is such that it may logically be ex- 
pected to flatten out, an asymptotic curve should be selected. If the ob- 
ject is solely historical study, the future behavior of the curve is not sc 
important. 

Assuming that economic processes conform to some this law 

may perhaps be discovered by first smoothing the data somewhat and 
then applying certain objective tests: 

(1) If the first differences are constant, use a straight line. 
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(2) If the second differences are constant, use a second degree curve. 

(3) If the third differences are constant, use a third degree curve. 

(4) If the first differences are changing by a constant percentage, use 
a modified exponential. 

(5) If the first differences resemble a normal curve, use a logistic. 

(6) If the first differences resemble a skewed frequency curve, use a 
Gompertz curve or a complex type of logistic. 

(7) If the first differences of the logarithms are constant, use an ex- 
ponential. (Fit a straight line to the logarithms.) 

(8) If the second differences of the logarithms are constant, fit a second 
degree curve to the logarithms. 

(9) If the first differences of the logarithms are changing by a constant 
percentage, use a Gompertz curve, 

(10) If the first differences of the reciprocals are changing by a con- 
stant percentage, use a logistic curve. 

It is not usually necessary to make these tests. Plotting the original 
data on arithmetic paper, semi-logarithmic paper, or probability paper 
may dictate the choice. It may be, however, that neither inspection of 
such charts nor the more objective tests we have described will be con- 
clusive. In the first place, the preliminary smoothing may not have been 
done properly. Secondly, the series may not conform to any simple mathe- 
matical description. In a dynamic world, forces in operation are seldom 
allowed to work themselves out before other factors make themselves felt. 
Consequently any type of mathematical trend may ^Vork’^ only for a 
relatively short period. 

In case of doubt as to which of several trends (each of the same number 
of constants) to use, that one is to be preferred from which the sum of the 
squared deviations of the data is at a minimum. In making this com- 
parison, arithmetic curves should not be compared with those fitted to 
logarithms. 

Selected References 

H. G. Brunsman: Simplified Procedure in the StatiHical Analysis of Time Series; 
Ohio State University Press, Columbus, Ohio, 1930. Discusses summation 
method. 

G. E. Davies and W. F. Crowder: Methods of Statistical Analysis in the Social Sci- 
ences , Chapter VI; John Wiley and Sons, New York, 1933. Includes summa- 
tion method of fitting polynomials. 

E, A. Fisher: Statistical Methods for Research Workers (Seventh Edition), pp. 148- 
158j Oliver and Boyd, Edinburgh, 1936. A discussion of orthogonal poly- 
nomials and the fitting of pohmomials by a summation method. 

Simon Kuznets: Secular Movements in Production and Prices; Houghton Mifflin Co., 
Boston, 1930. The concepts of primary and secondary trends are explamed 
and illustrated. 



Chap. 16 ] 


OBJECTIVE TESTS OF TRENDS 


463 


F. C. Mills: Statistical Methods Applied to Economics and Btisiness (Revised 

Edition), pages 253-279 and Appendix D; Henry Holt and Co., New York, 
1938. Contains a section on selection of curve type. Appendix D illustrates 
another use of the modified exponential, and use of reciprocals in fi.tting a 
logistic. 

Raymond Pearl - Studies in Human Biology, Chapters XXIY, XXV; Williams and 
Wilkins, Baltimore, 1924. Chapter XXIV explains the theory of the logistic 
curve, while Chapter XXV shows its application to the growth of human 
populations. 

C. H. Richardson* An Introduction to Statistical Analysis, pages 169-200; Harcourt, 
Brace and Co., New York, 1934. Gives properties and methods of fitting a 
number of t 3 rpes of curves. 

T. R. Running. Empirical Formulas; John Wiley and Sons, New York, 1917. The 
properties of a considerable number of curves are given, as well as directions 
for fitting them. Graphic devices are frequently employed. 

Max Sasuly: Trend Analysis of Statistics; Brookings Institution, Washington, 1934. 
An advanced treatise for students trained in mathematics. Some of the material 
has to do with cyclical movements. 

J. G. Smith - Elementary Statistics, Chapters VI, X, XII, XIV; Henry Holt and Co., 
New York, 1934. Chapter VI is an elementary treatment of th»" mathematical 
meaning of certain t 3 q)es of equations. Chapter X gives a brief treatment of 
the method of least squares and of curve fitting. In Chapter XII, empirical 
trends, their validity, and methods of fitting are discussed. Chapter XIV, 
pp. 260-262, explains how to convert second degree trends from an annual to a 
monthly basis. 

G. W. Snedecor: Statistical Methods Applied to Experiments in Agriculture and 

Biology, pages 279-289; Collegiate Press, Ames, 1937. Gives directions for 
fitting orthogonal polynomials as high as seventh degree. 

Carl Snyder Business Cycles and Business Measurements, Chapter II; Macmillan 
Co., New York, 1927. Shows the results of fitting trends to a variety of data. 

F. F. Stephen: “Summation Methods in Fitting Parabolic Curves,” Journal oj 
the American Statistical Association, Vol, XXVIII, December 1932, pp. 413-423< 



chapter XVII 

PERIODIC MOVEMENTS 


As indicated in Chapter XIV, there are many types of periodic move- 
ments, including those that repeat themselves daily, weekly, monthly, or 
annually. In this chapter most of the measurements will be of monthly 
movements within a year, commonly known as seasonal movements. The 
principles laid down can easily be applied to the various other types of 
periodic movements. It will be the plan of this discussion to start with 
data which lend themselves to very simple treatment, and gradually to 
introduce more complex methods as they are required. The treatment of 
seasonal movements that vary in their pattern from year to year will, 
however, be reserved for the following chapter, as will the measurement 
of weekly seasonal movements which involves speciax problems. In gen- 
eral, all the methods involve averaging separately the values of the differ- 
ent Januaries, Februaries, etc., but differ chiefly in the degree to which 
the data are refined before being averaged. 

Averages of unadjusted data. When the data do not contain cyclical 
movements or trend to any appreciable extent, it will suffice to average 
the data without making any previous adjustment. An illustration of 
such data is the circulation of books in the reserve room of the University 
of North Carolina, which is shown in part A of Table 104. From this 
table were excluded data for part weeks and the week preceding spring 
vacation — the latter because circulation was unusually low during the last 
two days of school. Below each column of data is given the average of 
that column. The averages, one for each day of the week, constitute a 
measure of the intra-week fluctuation in circulation of books. For con- 
venience, however, it may be desirable to express this measure in percent- 
age form. By dividing each of the seven daily averages by the average 
weekly circulation and multiplying by 100, we find the per cent of total 
weekly circulation for each day of the week. Thus, only 5.4 per cent of 
the total normally occurs on Sunday, while on Monday it averages 19.8 
per cent* A more usual method of stating the index is to express each 
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TABLE 104 

COMPtJTATION OS' IkDEXES OF InTBA-WeEK VARIATION OP CIRCULATION OF BoOEB AT THE UNIVERSITY OP NORTH CAROLINA RESERVE RoOMj 

Spring Quarter, 1937 
A Unadjusted data (number of books) 
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daily average as the percentage of average daily circulation during the 
week. Thus, in part A of the table, each number in row 8 is divided by 
469.53 (- 3,286.71 7), and the result set down as a percentage in row 

10. Still other methods of stating the index are occasionally used. Some- 
times the figures are put in terms of deviations above or below normal, 
either in absolute terms or in percentages. 

Percentages of simple averages. It is a slight improvement of tech- 
nique to express the circulation of each day of each week as a percentage 
of average circulation of that week. This gives each week equal impor- 
tance in determining the index of variation, instead of extra weight being 



SUN MON TUE WED THU FRI SAT 

Chart 173. Two Measures of Typical Intra-Week Circulation of Books at the Re- 
serve Room of the University of North Carolina, Spring Quarter, 1937. (Data of 
Table 104.) 

accorded to weeks having large circulation. It might be thought offhand 
that such extra weight is highly desirable, but it does not necessarily follow 
that weeks of large circulation are weeks with typical circulation pattern. 
Furthermore, by putting the data in percentage form, we can more readily 
detect erratic variations from the typical weekly pattern. A study of 
such percentages for each day might lead one to select some average other 
than the arithmetic mean. Thus, in the present instance, use of the 
median gives the broken curve of Chart 173, which differs somewhat from 
the solid line representing the means of the unadjusted data. Computa- 
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tions are shown in part B of Table 104. Since medians were used, it was 
necessary to make a slight adjustment in order to have the index total 
700 per cent and thus average 100 per cent. 

Trend adjustment for averages. VTienever, as is usual, the data have 
a secular trend, any seasonal index following the simple methods just de- 
scribed will have an upward or downward bias, depending on the direction 
of the trend. Thus, if we were dealing with monthly data and if the 
trend were upward, each December would be higher than the preceding 
January by an amount equal to ^ of the annual growth (assuming a linear 
trend), even if there were no genuine seasonal movement. Because of 
this fact the seasonal index, which is supposed to exhibit seasonal move- 
ments only, would slope upward; and if there were a true seasonal move- 
ment, the December index number would be too high relative to the 
January index number by ^1- of the annual growth. 

In order to correct for trend, we can remove its influence either before 
or after averaging the data. The easiest method is to remove it after the 
data have been averaged. In Table 105, average electric power produc- 
tion per calendar day over the 10 years has been computed for each month, 
and is shown in row 11. Immediately below that are given the monthly 
trend values which refer to the 12-month period that is central in point of 
time of the entire series. (The trend equation is Yc = 192.86 + 1.48636X, 
with origin at 1925-1926 and X units of one month.) These trend values 
may be thought of as the trend values of a typical year. Dividing the 
monthly averages by these trend values eliminates the effect of the secular 
trend, leaving the estimate of seasonal movements shown in row 13. The 
seasonal index will average approximately 100 per cent, since the total of 
row 12 is the same as that of row 11. 

It is perhaps easier to make the trend correction in another fashion. 
Row 14 shows the cumulative trend increments as measured from the 
middle of a calendar year (that is, noidway between June and July). Since 
the monthly trend increment {h in the trend equation) is 1.48636, the value 
referring to June is minus one-half of that amount, and July plus one-half 
that amount, so that July is larger than June by the amount of b; likewise 
August is larger than July by this amount, as is each month with respect 
to the preceding. The trend element present in row 14 is now subtracted 
and the results recorded in row 15. This procedure, however, does not 
give a seasonal index expressed as percentages. Were the seasonal esti- 
mates percentages of normal, they should average lOO per cent, and total 
1200 per cent. Instead, they total 2314.2 (millions of kilowatt hours). 
Each of the numbers in row 15 can now be expressed as a percentage of 
average by dividing by 1.9285, since 192.85 is the average of the values 
in row 15. Mechanically it is easier to make the adjustment in a slightly 
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100 X (row 11 -j- row 12) 

t Row 15 multiplied by 518538 = 1,200 0 - 2,314 2 

Source: Original data from Umted States Department of Commerce, Survey of Current Business, 1936 Supplement, p 85 For method of computmg production 
calendar day m 1930, see Table 78, first four columns. 
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or .518538. This is done in row 16. The method described 


2 314 2 

different manner. Since the total of row 15 is > of the value de- 

lj2UU.U 

siredj it is necessary only to multiply each of the numbers in this row by 
1,200.0 
2,314.2' 

in this paragraph ordinarily is not so logical as the method that elimi- 
nates the trend by dividing, since a time series is probably made up 
of TXCXSXI (Trend X Cycle X Seasonal X Irregular) rather than 
T + C + + J. However, the discrepancy in the results usually, as in 

this case, is negligible. 

The methods of adjustment for trend just described are not satisfactory 
when trend is an important component of the series. In years when trend 
is high, absolute values of the data will be large and these large values will 
have a disproportionate effect upon the monthly means. These methods 
are further limited in their usefulness to data the trend of which is linear. 

Percentages of trend. A more satisfactory method of eliminating the 
disturbing element of trend is to compute percentages of monthly trend 
values before averaging the individual months.^ Since one step in time 
series analysis frequently consists in computing the trend, the energy used 
in computing trend values is not wasted. To compute percentages of 
trend, we merely divide the original data by the trend values separately 
for each month and multiply by 100. The results of such computations 
for electric power production, 1921-1930, are shown in Table 106. 

Since, however, this is a somewhat more refined method than the two 
previously described, it ma3^ be worth while to array the data before de- 
ciding upon the method of averaging. Inspection of the array of Table 
107 in conjunction with Table 106 reveals that most of the high values 
occurred in 1923 or 1927, which were years of prosperity, while the low 
values tended to occur in the depression years of 1924 or 1930. This is 
not a surprising result; it indicates that the cyclical movements (though 
rather mild) are more powerful in this series than the irreg^ilar movements. 
The arithmetic mean is admirably adapted to averaging out random varia- 
tions but not for taking care of cyclical movements, which, by their nature, 
are not distributed in random fashion. On the other hand, the median is 
better suited for averaging such data. Although other types of averages 
might also be satisfactory for this problem, the median is often selected 
because of its simplicity. The median of each column is recorded in row 
11 of Table 107, and the index adjusted to total 1,200.0 in row 12. The 


^ This method is sometimes referred to as the Falkmr method. See ‘The Measure- 
ment of Seasonal Variation/' by Helen D. Falkner, Jovmal of the Amencan StatiRtKat 
Associaiiorh June 1924, pp. 167-179. 
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* Bow 11 miiltiplied by correction factor. 999417 = 1,200 00 — 1,200 70 
Source. Table ^06 



Chap. 17] 


PERIODIC MOVEMENTS 


471 


adjustment was made by multiplying row 11 by 1,200.00 -t- 1,200.70, 
or .999417. 

This method of computing seasonals has something to commend it, since 
it is one of the simplest and easiest of methods. It is subject to the criti- 
cism, however, that it attempts to average out cyclical movements instead 
of eliminating them before the averaging process. An av^erage cannot be 
expected to accomplish this if the cycles are pronounced, particularly if 
the period covered is short. Consequently it is most appropriately used 
when cyclical movements are unimportant relative to seasonal movements. 
Finally, the trend should be fitted to a period coinciding with that of the 

PEfe 
106 

104 

102 


100 

90 

96 

94 

Chart 174. Seasonal Indexes of Electric Power Production per Calendar Day, by Two 
Methods, 1921- 1936. (Data of Tables 105 and 107.) 

seasonal measured. These limitations, of course, apply also to the methods 
previously discussed, which eliminate the trend after averaging the data. 
The use of this method, however, is not limited to data where the trend is 
linear. 

In Chart 174 the seasonal index based on percentages of trend is com- 
pared with the index obtained by dividing by trend after averaging the 
raw data. Although the percentage of trend method gives more accurate 
results, the differences between the two indexes is rather slight. The 
greater accuracy of the per cent of trend method is probably due as much 
to the use of the median (instead of the arithmetic mean) as it is to the 
difference in the general method. 

Percentages of 12-montli moving average. In Chapter XV it was 
stated that moving averages iron out periodic movements if the moving 
average has the same number of months as the periodic movements sought 
to be eliminated. A 12-month moving average, therefore, should entirely 
elinainate seasonal movements if they are of constant pattern and ampli- 
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tude. Using the magazine advertising data, this moving average can bo 
obtained by averaging together the first twelve months (January 1921 to 


December 1921, inclusive), then 
the second twelve months (Febru- 
ary 1921 to January 1922, inclu- 
sive), and so on. Results are ob- 
tained most easily, however, as 
follows: 

(1) Total separately the items 
for each calendar year and record 
these totals between June and July 
of their respective years. In 
Tables 108 and 109 these totals are 
starred 

(2) From the 1921 calendar 
year figure subtract the figure for 
January 1921, add the figure for 
January 1922, and sub-total. From 
this sub-total subtract February 

1921, add February 1922, and 
again sub-total. Continue this 
process. When the figure to be 
recorded between June and July 
1922 is reached, it should check 
with the total already recorded as 
the total for the calendar year 

1922. This procedure is carried 
through until all the moving total 
figures are obtained. The adding 
machine slip for the first 15 of the 
12-month moving totals will ap- 
pear as shown. 

The first moving total figure w’as 
placed between Jmie and July 
1921, since the middle of that 12- 
month period is at midnight of 
June 30, 1921. Likewise, the sec- 
ond such period centers at the end 


22, 271 •00* 

1.979.00- 

1.632.00 

21.924.00 S 

1.981.00- 

1.768.00 

21.711.00 S 

2.005.00- 

1.922.00 

21.628.00 S 

2.099.00- 

2.171.00 

21.700.00 S 

2.145.00- 

2.215.00 

21.770.00 S 

1.933.00- 

2.046.00 

21.883.00 S 

1.573.00- 

1.705.00 

22.015.00 3 

1.402.00- 

1.556.00 

22.179.00 S 

1.620.00- 

1.940.00 

22.499.00 S 

1.824.00- 

2.470.00 

23.145.00 S 

1.903.00- 

2.466.00 

23.708.00 S 

1.807.00- 

2.464.00 

24.365.00 S 

1.632.00- 

2.093.00 

24.826.00 S 

1.768.00- 

2.301.00 

25.559.00 S 


Checked total, 
starred in Table 108 


of July 1921. But the monthly Portion of Adding Ma- 
data are for calendar months, cen- 

1. • i XU 1 Twelve-Month 

tomg at the luteenth of each Moving Totals. 



TABLE 108 

Computation op Gentebbd 12-Month Moving Average, and Percentages of Moving Average for United Siates Magazine Adver- 
tising, 1921-1922 


Per cent of 
12-month 
moving average 
[Col 2 Col. 6] 
(7) 

rH b«. tH T— j CO 
. . I . lO b- 05 rH JlO O 

..... OOt^OOOOOi 

tH tH 

CM O <M rH lO iH CM C5 b^ CO (M b- { 
OSCDCO'^COCMCO'ciHQCMOSCD ( 
COOSOrHrH OG01> 05rH oo P 
iH tH rH iH rH r-1 rH ^ 

Centered 
12-month 
movmg average 
[Col 5 4-2] 

(6) 

T-t 00 CO lO T-l os 
’ * ■»— d 05 O ^ ^ 

*. I ! . 

tH rH tH tH r— t r-1 

05 »H CM CM CM CO O ^ O 05 05 o 

CM '«n}^ CD O uo O to 05 05 to rH / 
00 CO OO^OS^O^O^CD CD^rH r-^CM^CO \ 
rH rH rH"rr"r-r(M~CM*'cM''cM'CM'CM"cM \ 

2-month 
moving 
total 
of CoL 4 
(5) 

05 <M 50 CO lO 00 
(M CD T-I O (M b- 
• • • . OOCOrH rH(MCO 

. . . cD CD cO cO cD CD 
cd'co'cc' co'coco' 

CMCOiHCOtHtHCMO^^COO ) 
00CMC0C0TlHcDCDCMO5cdb-C5 ) 
tC)00CMOOOC500l'r05THrH \ 
CD CD b^OO CD r-(^Cl^CV;^iO O f 

CO CO CO CO CO" r-iT -.tt* TiT -sjT / 

12-month 
movmg 
average 
[Col 3 12] 

(4) 

05 O (M CO CO CM CD 
• • • to IV 05 (N 00 rti CO 

* iO(M O OO rHCM 

• • . • 00 COO 00 00 00 CO C!0 

tH tH rH tH fH rH r— 1 

CDCMOSlvt^T^iOOCMCMCMrHo) 
THOOTtiOOiCOCOcdcDCMiOCo) 
CO^JvCMt^COCDrHCDOOOOOON 
00 CO 00 (05 OS^C^O^rH rH^CM^CM^CO ( 
r-T tH rH*' th" CM*' Cm" CM CM** CM" CM* CM" f 

12-month 

moving 

total 

(3) 

* 

tHtH rH OOQ OCO 

1> CM rH (M O br 00 

(M 05 b- CD b- b- 00 

* * • • li-HrHrHrH 

CM <N 05 <N N (N CM 

* \ 
5D(35C5iCC0tDCD(O5rtH«D'?j^b.( 
rH^v05T4^ocDCM^C5500CMO) 

^ «j ■mJJ -.j r>„ /-r\ D b» w \ 

/ / 

*1 M 0«i Dl M ■*. 0. '* -'1 J 

Magazine 
advertising 
(thousands 
of hues) 

(2) 

1 CSrHiDOSlOCO-ODlO-^COb- 
! C» 05_0 o — C0_ 

rn'rM CM C^T CT rH rH rH ■rH rH* tH tH 

1 

(M C50 CM tH lO CD ID CD O O CD \ 

CO CD CM l> rH tfi O CD b- CD CO f 
CD b- 05 rH CM Ob^ to (05 -CH ) 
r-T rH r-T of CM^CM^tH r-T tHcTcm" cm" \ 

Year and month 
(1) 

1921 

January .... , 

February .... 

March ... 

April ... . 

May . ...... . ... . . 

June 

July 

August . . ... 

September 

^ October 

November 

December 

1922 

January 

February . . . . 

March ... . . . ... 

A.pril ... 

May 

June . ..... .... 

July 

August . ... ..... . ... 
iSeptember ... . . 

October . 

November . ... 

December .... 


* Calendar year totals 

Boiirce* See Table 109, 
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month. It is necessary, therefore, not only to convert the moving totals 
into moving averages, but to center them at the middle of each month. 
Two procedures will be explained for accomplishing this result. 

The longer, although more easily understood, method is illustrated for 
1921 and 1922 in Table 108. Divide each of the moving totals by 12; or, 
to save time, multiply by .0833333, the reciprocal of 12. This process 
obtains the moving averages. They are centered by taking a 2-month 
moving average of the 12-month moving averages. This is done by taking 
a 2-month moving total of column 4 and dividing by 2. The results, 
recorded in column 6, may be called a centered 12-month moving average.^ 

A short cut method which gives precisely the same results is illustrated 
in Table 109. In columns 4 and 5 of this table we center the moving 
totals, rather than the moving averages, and convert them into moving 
averages by multiplying by .0416667, the reciprocal of 24. The merit of 
this method is that one column of divisions (multiplications by reciprocals) 
is eliminated. 

These results, together with the original data, are plotted in Chart 175A 

The next step is to divide the original data by the centered 12-month 
moving average in order to express them as percentages of this average. This 
is done in Table 109, with results recorded in column 6. The values are the 
same as those of Table 108, colunm 7. These results are shown by the 
solid line of Chart 175B. The logic of the procedure is as follows: 
Time series are assumed to be composed of T X C X S X I (Trend X 
Cycle X Seasonal X Irregular). The 12-month moving average is a rough 
estimate of T X C because the 12-month average smooths out seasonal 
movements and, for the most part, irregular movements, since the latter 
are largely movements of small amplitude and short duration.^ If now 
we divide the original data by the 12-month moving average, we have an 
estimate of the seasonal and irregular movements combined: 

TXCXSXI 

TXC - ^ X i. 

It is well at this point to take note of the progress that has been made 
by comparing sections A and B of Chart 176. Section A is a rearrange- 

2 Some statisticians do not consider the added accuracy obtained by centering to be 
worth the added effort. The difference in results in the seasonal index is usually Very 
shght. 

3 Actually the irregularities are not entirely smoothed out; on the other hand, some 
of the cyclical movements are partly smoothed out, so that the amplitude of the cychcal 
movements is shghtly reduced. Consequently some statisticians smooth the moving 
average curve further by inspection and also alter it slightly, particularly at cychcal 
peaks and troughs, so as to obtain a better estimate of T XC. These adjusted data 
are then used in subsequent steps for obtaining the seasonal index 



TABLE 109 

Short Method op Computing Centered 12-month Moving Average, and ot 
Percentages of Moving Average for United States Magazine 
Advertising, 1921-1930 


Year 

and 

month 

Magazine 
advertising 
(thousands 
of hnes) 

12-month 

moving 

total 

2-month 
moving 
total 
of Col. 3 

Centered 
12-month 
moving 
average 
[Col 4 24] 

Per cent of 
12-month 
moving 
average 
[Col 2-^ 
Col. 5] 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

1921 






January . 
February 

March . 

1,979 

1,981 

2,005 





April . 

2,099 





May 

2,145 

. 

. . 



June 

July 

1,933 

1,573 

22,271* 

21,924 

21,711 

21,628 

21,700 

21,770 

21,883 

44,195 ‘ 

1,841 

85 4 

August 

1,402 

43,635 

1,818 

771 

September 

1,620 

43,339 

43,328 

1,806 

89 7 

October , 

1,824 

1,805 

101 1 

November 

1,903 

43,470 

1,811 

1051 

December 

1,807 

43,653 

1,819 

99 3 

1922 






January 

1,632 

22,015 

22,179 

22,499 

23,145 

23,708 

24,365* 

24,826 

25,359 

25,994 

26,786 

27,421 

28,007 

43,898 

1,829 

89 2 

February 

1,768 

44,194 

1,841 

96 0 

March 

1,922 

44,678 

1,862 

103 2 

April 

2,171 

45,644 

1,902 

1141 

May 

2,215 

46,853 

1,952 

113.5 

June 

2,046 

48,073 

2,003 

1021 

July 

1,705 

49,191 

2,050 

83 2 

August 

1,566 

50,185 

2,091 

74 9 

September 

1,940 

51,353 

2,140 

90 7 

October 

2,470 

52,780 

2,199 

112 3 

November . 

2,466 

54,207 

2,259 

109 2 

December 1 

2,464 

55,428 

2,310 

106 7 

1923 






January 

2,093 

28,452 i 

28,750 

29,087 

29,490 

29,924 

30,233* 

30,421 

30,677 

31,035 

31,345 

31,483 

31,681 

56,459 

2,352 

89.0 

February 

2,301 

57,202 

2,383 

96 6 

March . 

2,557 

57,837 

2,410 

1061 

April 

2,963 

58,577 

2,441 

121.4 

May 

2,850 

59,414 

2,476 

115.1 

June 

2,632 

60,157 

2,507 

105 0 

July. 

2,150 

60,654 

2,527 

85.1 

August , 

1,864 

61,098 

2,546 

73 2 

September 

2,277 

61,712 

2,571 

886 

October . . 

2,873 

62,380 

2,599 

110 5 

November 

2,900 

62,828 

2,618 

110.8 

December 

2,773 

63,164 

2,632 

105.4 


475 



TABLE 109 (Continued) 

Shoet Method of Computing Centered 12-month Moving Average, and op 
Percentages of Moving Average for United States Magazine 
Advertising, 1921-1930 


Year 

and 

month 

Magazine 
advertising 
(thousands 
of lines) 

12-month 

moving 

total 

2-month 
moving 
total 
of CoL 3 

Centered 
12-month 
moving 
average 
[Col 4 ~ 24] 

Per cent of 
12-month 
moving 
average 
[Col 2 -h 

Col 5] 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

1924 






January 

2,281 

31,548 

31,503 

31,468 

31,374 

31.374 
31,442* 
31,272 
31,228 
31,025 
30,703 
30,569 

30.374 

63,229 

2,635 

86 6 

February 

2,557 

63,051 

2,627 

97 3 

March 

2,915 

62,971 

2,624 

111 1 

April 

3,273 

62,842 

2,618 

125 0 

May 

2,988 

62,748 

2,615 

114 3 

June 

2,830 

62,816 

2,617 

1081 

July 

2,017 

62,714 

2,613 

77.2 

August 

1,819 

62,500 

2,604 

69 9 

September 

2,242 

62,253 

2,594 

86 4 

October . 

2,779 

61,728 

2,572 

108 0 

November 

2,900 

61,272 

2,553 

113.6 

December . 

2,841 

60,943 

2,539 

111.9 

1925 






January . 

2,111 

30,425 

30,484 

30,727 

31,010 

31,354 

31,473* 

31,747 

32,088 

32,406 

32,798 

33,180 

33,569 

60,799 

2,533 

83.3 

February 

March . 

2,513 

2,712 

60,909 

61,211 

2,538 

2,550 

99 0 

106 4 

April . 

2,951 

61,737 

2,572 

114.7 

May, 

2,854 

62,364 

2,599 

109 8 

June . . 

2,635 

62,827 

2,618 

100 6 

July . . 

August 

September- . 

2,068 

1,878 

2,485 

63,220 

63,835 

64,494 

2,634 

2,660 

2,687 

78 5 

70.6 

92 5 

October 
November . . 

3,062 

3,244 

65,204 

65,978 

2,717 

2,749 

112.7 

118.0 

December. , . 

2,960 

66,749 

2,781 

106 4 

1926 






January 

2,385 

33,869 

34.172 
34,518 
34,900 
35,242 
35,491* 
35,642 
35,793 
36,018 

36.172 
36,513 
36,501 

67,438 

2,810 

84 9 

February , . , 

2,854 

68,041 

2.835 

100 7 

March 

3,030 

68.690 

2,862 

105.9 

A-pril 

3,343 

69,418 

2,892 

115 6 

May ...... . 

3,236 

70,142 

2,923 

110 7 

June 

3,024 

70,733 

2,947 

102 6 

July 

2,368 

71,133 

2,964 

79 9 

August — , , . . 

2,181 

71,435 

2,976 

73 3 

September. . . 

2,831 

71,811 

2,992 

94.6 

October 

3,444 

72,190 

3,008 

114,5 

November 

3,586 

72,685 

3,029 

118.4 

December. . . . 

3,209 

73,014 

3,042 

105.5 


476 



TABLE 109 (Continued) 

Shobt Method of Computing Centered 12~month Moving A'vterage, and of 
Percentages of Moving Average for United States Magazine 
Advertising, 1921-1930 


Year 

and 

month 

(1) 

Magazine 
advertising 
(thousands 
of hnes) 

(2) 

12-month 

moving 

total 

(3) 

2-month 
moving 
total 
of Col 3 

(4) 

Centered 
12-month 
moving 
average 
[Coi 4 24] 

(5) 

Per cent of 
12-inonth 
movmg 
average 
[Coi 2 

Col. 5] 

(6) 

1927 






January 

2,536 

36,553 

36,601 

36,532 

36,499 

36,486 

36,453*^ 

36,364 

36,205 

36,162 

36,340 

36,198 

36,247 

73,054 

3,044 

83 3 

February 

3,005 

3,25S 

73,154 

73,133 

3,048 

98 6 

March 

3,047 

106 8 

April 

3,497 

73,031 

3,043 

114 9 

May 

3,577 

72,985 

3,041 

117.6 

June 

i 3,012 

72,939 

3,039 

991 

July 

2,420 

72,817 

3,034 

79 8 

August 

2,229 

72,569 

3,024 

73.7 

September 

1 2,762 

72,367 

3,015 

91.6 

October 

3,411 

3,573 

72,502 

3,021 

112 9 

November 

72,538 

3,022 

118 2 

December 

3,176 

72,445 

3,019 

105 2 

1928 






January 

2,447 

36,410 

36,339 

36.382 
36,470 

36.383 
36,379*^ 
36,616 
36,928 
37,317 
37,724 
38,164 
38,650 

72,657 

3,027 

808 

February 

2,846 

72,749 

3,031 

93 9 

March 

3,212 

72,721 

3,030 

1060 

April 

3,675 

72,852 

3,036 

1 1210 

May 

3,435 

72,853 

3,036 

1 1131 

June 

3,061 

72,762 

3,032 

101.0 

July . 

2,583 

72,995 

3,041 

84.9 

August 

2,158 

73,544 

3,064 

704 

September 

2,805 

74,245 

3,094 

907 

October . . . 

3,499 

75,041 

3,127 

111.9 

November 

3,486 

75,888 

3,162 

110 2 

December 

3,172 

76,814 

3,201 

99.1 

1929 


1 




January . , 

2,684 

38,931 

39,203 

39,560 

39,821 

40,163 

40,606* 

40,427 

40,293 

40,108 

39,903 

39,667 

39,474 

77,581 

3,233 

83.0 

February 

3,158 

78,134 

3,256 

970 

March . 

3,601 

78,763 

3,282 

109.7 

April 

4,082 

79,381 

1 3,308 

123.4 

May . 

3,875 

79,984 

3,333 

116.3 

June 

3,547 

80,769 

3,365 

105.4 

July,* . , 

2,864 

81,033 

3,376 

84.8 

August. , . 

2,430 1 

80,720 

3,363 

723 

September 

3,162 ' 

80,401 

3,360 

944 

October . . 

3,760 

80,011 

3,334 

1128 

November . . 

3,828 

79,570 

3,315 

116.5 

December. .. 

3,615 

79,141 

3,298 

1096 


477 
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TABLE 109 (Continued) 

Shoet Method of Computing Centered 12-month Moving Average, and op Per- 
centages OF Moving Average for United States Magazine 
Advertising, 1921-1930 






Centered 

Per cent of 

1 2-mortth 

Year 

Magazine 

12-month 

2-month 

12-month 

moving 

and 

month 

advertising 
(thousands 
of hnes) 

moving 

total 

moving 
total 
of Col, 3 

moving 
average 
[Col 4 ^ 24] 

average 
[Col. 2 ~ 

Col 51 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

1930 






January. , 

2,505 

39,061 

38,688 

38,124 

37,385 

36,599 

35,804’*' 

78,535 

3,272 

76,6 

February . . 

3,024 

3,416 

77,749 

3,240 

93.3 

March 

76,812 

3,201 

106 7 

April 

3,877 

75,509 

3,146 

123 2 

May 

3,639 

73,984 

3,083 

118 0 

June 

July . . 

3,354 

2,451 

72,403 

3,017 

1112 

August . 

2,057 





September 

2,598 





October , 

3,021 




• • . 

November , 

3,042 1 




• • • 

December . 

2,820 1 

i 






* Calendar year totals 

Source United States Department of Commerce, Survey of Current Business, October 1933, p. 20; 
1936 Supplement, p 24 , December 1936, p 25; May 1937, p. 26 


ment of the data of the solid line of Chart 175 A, while section B is a re- 
arrangement of the data of the solid line of Chart 175B. First, it is to 
be noted that the seasonal movements appear to be more regular in sec- 
tion B. This is because section A contains trend and cyclical movements, 
while these have been largely eliminated in section B. Secondly, a careful 
inspection of section B indicates that the seasonal pattern has not been 
uniform throughout the entire period. The amplitude of the seasonal 
swing is larger in the more recent years, the April peak having become 
relatively more pronounced. Although this change did not take place 
abruptly, nevertheless, the pattern seems to be fundamentally different 
after 1929. Consequently it seems advisable to compute two separate 
indexes, one based on 1922-1929 percentages, and the other on 1930-1935 
percentages.^ 

The procedure is now similar to that followed in obtaining a seasonal 
index from percentages of trend. In that case the problem was to average 


^ ^ method for obtaimng a progr^sively chaaging or '‘moving’^ seagopal is explained 
in Chapter XVni 
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out cyclical and irregular movements. In the present instance it is mainly 
irregular movements which must be removed by averaging. 

In order to study the distribution more carefully, section A of Chart 
177 is made. Each circle or dot represents one of the percentages of 
Table 110, which is a rearrangement of data in the last column of Table 


MILLIONS 
or LINES 



Chart 175A. United States Magazine Advertising and 12-Month Moving Average, 
1921-1936. (1921-1929 data are from Table 109; 1930-1936 data are based on figures 
of Table 116.) 


fTRClHT 
140 r— 



1921 1022 1923 1924 1023 1026 1027 1928 1929 193 & 1931 1932 1933 1934 1938 1838 


Chart 157B. Percentages of 12-Month Moving Average of United States Magazine 
Advertising and Seasonal IndeteS) 1921-1936. (Percentages of moving average are 
represented by solid line, seasonal index by dotted line. Based on figured of Tables 109 
and 116.) 


109. The scatter is in the main due to irregular movements, but it is also 
partly due to lack of complete homogeneity. Use of a 12-month moving 
average cannot completely eliminate cycles ; it wfli not reach up into the tip 
of their peaks or drop into the bottom of the troughs. A tabulation of the 
number of times that each year includes one of the two highest percentages 
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A B 

Chart 176. tTnited States Magazine Advertising: A. Unadjusted, and B. Percentages 
of 12“Month Moving Average, 1921-1936. (For purposes of comparison the different 
curves have been plotted close together on the same chart instead of on separate charts. 
Each curve is plotted to the same vertical scale, but at a different level This arrange- 
ment is sometimes lef erred to as a muhiple axis chart For source of data, see Chart 175.) 
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for any monthly column of Table 
results: 


110, or one of the two lowest, gives these 
Two highest Two lowest 


1922 2 4 

1923 2 2 

1924 4 4 

1925 1 4 

1926 4 2 

1927 4 2 

1928 1 5 

1929 6 1 



A 1922-1929 



B. 1930 - 1935 

Chart 177. Arrayed Percentages of 12-Month Moving Average and Seasonal Index, 
United States Magazine Advertising: A. 1922-1929, B. 1930-1935. (1922*1929 dat» 
are from Table 111, 1930*1935 data are based on daures of Table 116.) 



TABLE 110 

Pbbcbntagbs op 12-month Moving Avebages of United States Magazine Advebtising, 1921-1929 
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It is significant that five low values occurred in 1928, a year of depression 
/or magazine advertising, and six high values occurred m 1929, a year of 
prosperity. The above tabulation does not tell the full story. For in- 
stance, four highs and four lows occurred in 1924, but advertising was 
high in the spring of 1924 and low in the fall. On account of the limited 
number of observations, an extreme deviation, whether an irregular move- 
ment or one of the systematic variation just described, exercises an undue 
influence on an arithmetic mean. Probably, therefore, it is well to elimi- 
iiate the more extreme deviations before computing averages of the differ- 
ent months. 

There are two ways of deciding what items to eliminate. One way is 
to consider each array of section A of Chart 177 separately and to eliminate 
items that appear to be unusually high or low, perhaps studying each large 
deviation individually and eliminating those for which a special circum- 
stance can be discovered. If this method is followed, one array might 
use an average of all items, another might employ the median, a third the 
central six items, a fourth all items except the two highest, etc. These 
averages might be called modified means. On account of the extreme 
subjectivity of the method, it is dangerous unless the statistician possesses 
a high order of knowledge and judgment. 

A more objective method, and the one which is recommended here, 
employs uniform modified means. Each month is treated exactly alike, 
and an average is computed for the central items in an array. The num- 
ber of items to exclude is determined by inspection of a diagram like sec- 
tion A of Chart 177. Although this decision is subjective, a curb is put 
on the statistician's exercise of judgment by the necessity of treating each 
array alike. No generally accepted rule concerning the number of items 
to exclude can be laid down, but very often it will be found advisable to 
exclude roughly the highest and lowest ten per cent of each array. The 
number to exclude bears a relationship to the duration of the cyclical 
movements of the series : the shorter their duration, the larger the propor- 
tion which should be excluded, other factors being equal. 

Another way of explaining the desirability of this type of average is as 
loUows: A mean becomes more reliable as the sample becomes larger, pro- 
vided the data are homogeneous. As the proportion of items averaged 
from a monthly array is increased, the mean therefore becomes progres- 
sively more reliable up to a certain point, when the increasing heterogeneity 
of these data reverses the tendency. The trick, then, is to include as 
many items as possible without including too many that are too unrepre- 
sentative. In the present instance the middle six out of eight were aver- 
aged. These are shown as dots on section A of Chart 177 and the excluded 
extremes left as hollow circles. 
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Table 111 is a computation table similar to Table 107. In it the data 
of Table 110 for calendar years 1922-1929 have been arrayed by months. 
The items to be excluded from our averages have been set off by horizontal 
lines. The modified means are shown in row 9. These total 1,202.8, and 
are adjusted to total 1,200.0 by multiplying by the correction factor 
.997672, which is 1,200.0 -4- 1,202.8.^ 

In similar fashion an index based upon medians of the 1930-1935 per- 
centages has been computed, and is shown graphically by section B of 
Chart 177. It should be compared with section A of the same chart. 
Not only is the seasonal pattern different, but the data conform less con- 
sistently to the seasonal pattern, as can be seen from the dispersion of the 
circles and dots. Irregularity of economic events has been a characteristic 
of the years following 1929. 

A graphic approach to seasonal measurement. The use of the 12-month 
moving average involves a considerable amoimt of labor. A graphic 
method is available, however, which is similar in logic to that method, 
and which produces reasonably accurate results if skillfully handled.® 

First the data are plotted on semi-logarithmic paper. It is advisable 
to select accurately ruled paper with a large vertical scale (about 8 inches 
to a cycle), since measurements are to be made from the chart. Needless 
to say, the plotting must be accurate. Next is plotted, by inspection, an 
estimate of the combined trend and cyclical movements. If desired, an- 
nual averages may first be plotted in the middle of each year as a partial 
guide. The plotting of the Trend X Cycle estimate is largely subjective, 
however, and requires a high order of judgment on the part of the statis- 
tician. Chart 178 illustrates the first two steps of this method. The 
similarity of the broken line of this chart to the 12-month moving aver- 
age of Chart 175A should be noted, as well as the differences between 
them. The differences are mainly: (1) the freehand curve is smoother; 
(2) it is more flexible, dipping further into the troughs and reaching higher 
into the peaks. It is largely on account of the apparently more faithful 
representation of Trend X Cycle by this line that the graphic method is 
claimed by some authorities to be superior to the 12-month moving average 
method. 

The next step is to compute graphically the ratio of the original data to 
the estimate of Trend X Cycle. First, on a small card or piece of paper, 


® It IS possible to introduce a short cut into the computation of the seasonal index 
by using percentages of 12-month moving totals rather than 12-inonth moving averages. 
If this is done, the total of the modified means will be in the neighborhood of 100.00, 
and the correction factor will be in the neighborhood of 12 rather than 1. * 

® See ''k Graphic Method of Measuring Seasonal Variation/' by William A. Spurr 
Jmsrnal of the American Statistical Association, Vol 32, June 1937, pp. 281-289, 
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draw a broken line perpendicular to its edge and label this line Trend X 
Cycle. Next, place the edge of this card on the chart at January 1921, 
with the Trend X Cycle mark directly on the broken line of the chart. 

MILLIONS 
OF LINES 
4 5 

4 0 

3 5 

3 0 

2 6 

2 0 


t .6 


I 0 

Chart 178. United States Magazine Advertising and Estimate of Trend X Cycle 
Movements, 1921-1929; Logarithmic Vertical Scale. (Data of Table 109, Coi. 2) 

Now, put a small mark on the edge of the cara at the point which coincides 
with the plotted point on the vsohd line for January 1921. Repeat this 
process for January of each year. We now have ten 
lines running to the edge of the card, a fairly long 
one representing the Trend X Cycle and nine 
shorter ones representing the different Januaries. 

Since January advertising is always rather low, 
each of these small marks lies below the Trend 
X Cycle base line. 

The January measurements are indicated on Chart Chart 179. Graphic 
179. The distances between the broken line and the 
various short lines are graphic representations of u^ted State^agazine 
estimates of the January seasonal ratio. The dif- Advertising for Years 

ferent estimates seem to group around some central (Readings 

. , , , i.’ X 1 . are from Chart 178.) 

pomt; consequently an average estimate can be 

marked on the card with reasonable accuracy. This average, which is a 

modified mean, is the long solid fine of the chart. In order to obtain 




1921 1922 1923 1924 1925 1926 1927 1928 1929 
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a numerical value for this average, we must place the card on a 
logarithmic scale of the same size as that of Chart 178 and with the dotted 
line on the scale value representing 100, and read the scale value corre- 
sponding to the average line on the card. In this instance the reading 
is 84.1. 

A similar set of measurements is now made for each month, with results 
shown graphically in Chart 180. The sum of the readings will not total 
exactly 1,200, and so an adjustment must be made, as in Table 112. 

TABLE 112 

Computation of Seasonal Index bt Graphic 
Method for United States Magazine 
Advertising, 1921-1929 


Month 

Chart reading 

Seasonal mdex* 

January . 

84.1 

84 0 

February .... 

98 0 

97 9 

March 

107 0 

106 9 

April 

117 0 

116 9 

May 

112 5 

112 4 

June 

102 5 

102.4 

July 1 

82 0 

81.9 

August 

72 8 

72.7 

September . . 

90.5 

90 4 

October 

112 8 

112 7 

November. . 

115.8 

115.7 

December 

106 5 

106.4 

Total . . , 

1,201.5 

1,200.3 


* Correction factor 99876 = 1,200 0 -r 1,201 5. 


Link relative method. Though not so simple as the moving average 
method or so easily adapted to complex types of seasonal movement, the 
actual computations of the link relative method are much less extensive. 
At one time it was probably the most widely used method. This method 
is based upon the averaging of link relatives. A Unh relative is the value 
for one month expressed as a percentage of the preceding. Thus the fol- 
lowing values are link relatives: 

Feb^ gtc. 

Dec Jan Feb 

Although subject to variation in detail, the steps involved in the calcu- 
lations may be summarized as follows: 

1. Express each month as a percentage of the preceding month. This is 
done in Table 113. 
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JUL AUG SEP OCT NOV DEC 

Chart 180. Graphic Observations of Seasonal Movements of United States Magazine Advertising, 1921-1929. (Headings are from Chart 178.1 
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The labor of these computations can be lightened if the following pro- 
cedure is adopted. First, place the last number of the entire series in 
the calculating machine and divide by the next to the last. In the pres- 
ent instance 

December 1929 __ 3,615 _ ^ 

When this result is obtained, clear the dials, but not the keyboard. Thus 
3,828 is still on the keyboard and may now be put in the machine prepara- 
tory to being divided by 3,760, the October 1929 value. By this procedure 
a number need be placed on the keyboard only once, whereas if the work 

140 
130 


120 
I to 
too 


90 

ao 
70 

Chart 181. Arrayed Link Relatives and Modified Means, United States Magazine 
Advertising, 1922-1929. (Data of Table 114 ) 

proceeded from earliest month to latest, each number would have to be 
set up twice. 

2. Average the link relatives for each month separately. The average 
employed is usually the median but may be a modified mean. In the 
present instance the mean of the middle six of the arrays of eight items 
was selected. See Chart 181 and Table 114. 

3. Chain the link relatives to the precedtng month by successive multipli- 

cation. This is done in row 10 of Table 114. January is arbitrarily taken 
as 100 per cent, and the other numbers are thus percentages of the January 
value. The last entry in this row, 111.89, is not the total of the row, but 
is the new January chain relative obtained by multiplying 137.46 by .814. 
(When multiplying percentages, remember that 115.6 per cent X 110.5 
per cent ^ 1.156 X 1.105 1.2774, or 127.74 per cent.) 


CENT 










o 












%* 

o 

O.4. 

t 

j 




in 

• 


o 

m 




— 






o 

o 

O 

•0 






& 






I 

1 

1 

1 







• 

a 

•• 

•• 





* 

o 

9 

o 




0* 

• 

o 






m 

o 







JAN FEB MAR, APR MAY JUN JUL AUG SEP OCT NOV DEC 



United States Magazine Advertising Link Beeatiyes, 1922-1929 
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tSuccesaive mcremente of 11 89 12 « .9908 

j Row 12 X correction faetcr 1,200 00 1,424 31 =» 842613 

Source: Table 113 
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4. Adjicst for trend by successive subtraction of a correction factor from 
each chain relative. This correction factor is of the amount by which 
the second January chain relative exceeds the first; 11.89 -t- 12 = .9908, 
in this case. This correction is seen in rows 11 and 12 of Table 114.'^ 

PER CEMT 



Chart 182. Seasonal Indexes of United States Magazine Advertising by three Methods, 
1922-1929. (Data of Table 115.) 

5. Adjust the chain relatives to total 1^200.0. The total of row 12 (ex- 

cluding the second January figure) is 1,424.31. The correction factor is 
1,200.00 1,424.31 = .842513, and each item in row 12 is multiplied by 

this number. The results, shown in row 13, total 1,200.1 and are the 
seasonal index numbers. 

^ This method of adjustment for trend seems appropriate for these data, smce the 
trend is a straight line If the trend is an exponential curve, a logarithmic adjustment 
should be made. Beginning with step 3 the procedure is as follows: 

3 Obtain logarithms of the average link relatives and total them. Since the logarithm 
of 100 is 2 0, the sum of the logarithms would tend to equal 24.000 were there no trend 
m the data (This will be clear if it is reahzed that successive multiplication of the 
twelve link relatives, taken as decimals, would obtain a product of 1.0 under the same 
circumstances ) 

4. Adjust for trend by subtracting a correction factor from each logarithm The cor- 
rection factor is of the amount by which the sum of the logarithms exceeds 24 

6. Chain the logarithms of the corrected link relatives to the preceding month by succes- 
sive addition. This is the logarithmic equivalent of successive multiphcation of the 
link relatives. The January logarithm is arbitrarily taken as 2 0, the logarithm of 100. 
The final logarithm in the chain is not the sum of the other numbers in that row, but 
IS the sum of the December logarithmic cham relative and the January adjusted loga- 
rithmic hnk relative. 

6. Obtain the anti-logarithms of the logarithmic chain relatives These are, of course, 
the chain relatives; that is, the seasonal variation of each month relative to January as 
100. In looking up the arti-loganthms, the characteristics of the logarithms are taken 
as 2 or 1. This is because, as a matter of convenience, the lir>k relatives are originally 
recorded as percentages rather than as decimals. Had decimals been used, the char- 
acteristics would each have been either 0 or —1. 

7. Adjust the chain relatives to total 1,200.0. This is done in the usual fashion bj 
multiplying by a correction factor. The results are the seasonal index numbers. 
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The link relative method of seasonal index construction has not so much 
to commend it as has the moving average method. The link relatives 
averaged together contain both trend and cyclical movements. Although 
the trend is subsequently removed, the process is effective only if the 
growth is one of constant amount or constant rate. Nor is the method 
so readily adaptable as the other to the construction of some of the more 
complex types of seasonal movements that will be described in the follow- 
ing chapter. Furthermore, it is a confusing method for most beginners. 
Yet it has some theoretical and practical advantages. It is a characteristic 
of time series that values of the original data for a given month are in part 
dependent on values existing during a few of the more recent preceding 

TABLE 115 

Seasonal Indexes of United States Magazine Advertis- 
ing, 1922-1929, AS Obtained by Three Methods 


Month 

Moving 

average 

method 

Graphic 

method 

Link 

relative 

method 

January 

84 8 

84.0 

84 3 

February 

97.2 

97.9 

96.6 

March 

106.6 

106 9 

106.0 

April 

118 2 

116.9 

118 0 

May 

113 5 

112.4 

113 2 

June 

102.6 

102.4 

102.8 

July 

816 

819 

81.1 

August 

72 0 ' 

72.7 

71.2 

September 

912 

90 4 

910 

October 

111.9 

112 7 

113 5 

November . 

114.1 

115 7 

115 8 

December 

106.3 

106 4 

106.6 


Source* Tables 111, 112, and 114 


months. Or, to put it another way, irregular movements are frequently/ 
of more than one month^s duration. If then, let us say, March, April 
and May are each unusually high months, nevertheless, April vill not be 
high relative to March, or May high relative to April. Thus the link 
relatives in such an instance are less disturbed by an irregular movement 
than would be percentages of a 12-month moving average. Also, with the 
link relative method, cyclical peaks and troughs do not influence the per-' 
centages from which the index is computed to the same degree as with the 
moving average method. Another point in favor of the method is that 
it more completely utilizes the da.ta. There is only one less link relative 
than the number of months available, whereas a 12-month moving average 
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(see Table 109) cuts off 6 months at each end. For short periods, there- 
fore, the link relative method is very useful. 

Comparison of results. The seasonal indexes by the last three methods, 
which employ a fairly refined technique, are given in Table 115. There is 
little to choose between the results.^ As shown by Chart 182, the three 
lines are very difficult to distinguish from one another. 

Adjustment for Seasonal 

A seasonal index may be computed for the purpose of studying the sea- 
sonal movement itself — possibly in order to avoid the consequences of it, 
possibly in order to smooth out the seasonal fluctuations. On the othe>' 



Chart 183. United States Magazine Advertising, Deseasonalized Data and Trend, 
1921-1937. (Data of Table 116. For trend, see Table 89.) 

hand, it may be that the computation of seasonal is but one step in the 
isolation of cyclical movements, either alone or in combination with secular 
trend. Often we are interested in studying the cyclical movements, and 
in this case it is necessary to adjust the original data, in turn, for seasonal 
movements and for trend. Sometimes, however, it may be desirable to 
study the combined effect of trend and cyclical movements. Thus busi- 
ness men, in making decisions, may consider not so much whether their 
sales are increasing relative to some vague trend as whether their sales are 
increasing or decreasing more than could be expected after taking the 
season of the year into consideration. 

The mechanics of eliminating seasonal is to divide the original data by 

® The three indexes are not strictly comparable as to period of time covered. The 
moving average method employs percentages of moving average from 1922 through 
1929. The link relatives are for the same period. The graphic method, however 
mcludes readings for 1921 as weH. 




TABLE 116 

Elimination of Seasonal Vakiations feom United States Magazine Advertising, 

1921-1937 

(Seasonal mde\es were computed by 12-month moving average method ) 


Year and 
month 
(1) 

Origmal 

data 

(2) 

Seasonal 

index 

(3) 

Deseasonahzed data 
[Col 2 ^ CoL 3) 
(4) 

1921 




January 

1,979 

84 8 

2,334 

February 

1,981 

97 2 

2,038 

March ... 

2,005 

106 6 

1,881 

April . . . 

2,099 

118.2 

1,776 

May 

2,145 

113 5 

1,890 

June 

1,933 

102 6 

1,884 

July 

1,573 

816 

1,928 

August 

1,402 

72.0 

1,947 

September ... . 

1,620 

912 

1,776 

October 

1,824 

1119 

1,630 

November 

1,903 

114.1 

1,668 

December 

1,807 

106.3 

1,700 

1922 




January 

1,632 

84.8 

1,925 

February 

1,768 

97.2 

1,819 

March 

1,922 

106 6 

1,803 

April 

2,171 

118 2 

1,837 

May 

2,215 

113 5 

1,952 

June 

2,046 

102 6 

1,994 

July 

1,706 

816 

2,089 

August 

1,566 

72 0 

2,175 

September 

1,940 

91.2 

2,127 

October 

2,470 

111.9 

2,207 

November 

2,466 

114 1 

2,161 

December 

2,464 

106.3 

2,318 

1923 




January 

2,093 

84.8 

i 2,468 

February 

2,301 

97 2 

! 2,367 

March 

2,557 

106 6 

2,399 

April 

2,963 

118 2 

1 2,507 

May 

2,850 

113,5 

2,511 

June 

2,632 

102.6 

2,565 

July 

2,150 

81.6 

2,635 

August 

1,864 

72.0 

2,589 

September 

2,277 

91.2 

2,497 

October 

2,873 

111.9 

2,567 

November 

2,900 

114,1 

2,542 

December 

2,773 

106.3 

2,609 

1924 




January ...... . . 

2,281 

84.8 

2,690 

February 

2,557 

97.2 

2,631 

March ... 

2,915 

106.6 

2,735 

April . . 

3,273 

118.2 

2,769 

May 

2,988 

113.5 

2,633 

June ! 

2,830 1 

102.6 

2,758 

July : 

2,017 

81.6 

2,472 

August i 

1,819 i 

72.0 

2,526 

September . . . , : 

2,242 ! 

91.2 

2,458 

October . ... 

2,779 1 

111.9 

2,483 

November 

2,900 ^ 

114.1 

2,542 

December 

2,841 1 

106.3 

2,673 
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TABLE 116 (Continued) 

Elimination of Seasonal Variations from United States Magazine Advertising 


(Seasonal indexes were computed by 12-month moving average method ) 


Year and 
month 
(1) 

Original 

data 

(2) 

Seasonal 

index 

(3) 

Deseasonalized data 
[Col 2 s- Col 31 
(4) 

1925 



2,489 

January 

2,111 

84.8 

February 

2,513 

97.2 

2,585 

March 

2,712 

106.6 

2,544 

April 

2,951 

118.2 

2,497 

May 

2,854 

113.5 

2,515 

June 

2,635 

102.6 

2,568 

July 

2,068 

81.6 

2,634 

August 

1,878 

72.0 

2,608 

September 

2,485 

91.2 

2, ?25 

October 

3,062 

111.9 

2,736 

November . , 

3,244 

114.1 

2,843 

December 

2,960 

106.3 

2,785 

1926 




January 

2,385 

84.8 

2,812 

February. . . . 

2,864 

97.2 

2,936 

March 

3,030 

106.6 

2,842 

April. . .« 

3,343 

118.2 

2,828 

May . 

3,236 

113 5 

2,851 

June 

3,024 

102 6 

2,947 

2,902 

July 

2,368 

816 

August 

2,181 

72 0 

3,029 

September 

2,831 

912 

3,104 

3,078 

October . . ... 

3,444 

1119 

November . ... 

3,586 

114.1 

3,143 

December ... ... 

3,209 

106 3 

3,019 

1927 


1 


January ... 

2,636 

84.8 i 

2,991 

3,092 

February 

3,005 

97 2 

March .... 

3,255 

106 6 

3,053 

April 

3,497 

118 2 

2,959 

May 

3,577 

113.5 

3,152 

June 

3,012 

102 6 

2,936 

July 

2,420 

296 6 

2,939 

August 

2,229 

I 2,762 

72 0 

3,096 

September 

91.2 

3,029 

October 

3,411 

111.9 

3,048 

November . .... 

3,573 

114.1 

3,131 

December 

3,176 

106 3 

2,988 

1928 




January i 

2,447 

84.8 ; 

2,886 

February 

2,846 

97.2 

2,928 

March 

3,212 

106 6 

3,013 

April 

3,675 

118 2 ! 

3,109 

May 

3,435 

113 5 

3,026 

June 

3,061 

102 6 

2,983 

July 

2,583 

816 

3,165 

August. 

2,158 

72.0 

2,997 

September 

2,805 

912 

3,076 

October 

3,499 

111 9 

3,127 

November 

3,486 

114 1 

3,055 

December 



106 3 

2,984 
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TABLE 116 (Continued) 

Elimination of Seasonal Vaeiations feom United States Magazine Advertising, 

1921-1937 


(Seasonal indexes were computed by 12-month moving average method ) 


Year and 
month 
(1) 

Original 

data 

(2) 

Seasonal 

index 

(3) 

Deseasonahzed dat£ 
[Col. 2 - Col 31 
(4) 

1929 




January 

2,684 

84 8 

3,165 

February 

3,158 

97.2 

3,249 

March 

3,601 

106 6 

3,378 

April 

May .... 

4,082 

118.2 

3,453 

3,875 

113 5 

3,414 

June . ... 

3,547 

102 6 

3,457 

July 

2,864 

816 

3,510 

August . . 

2,430 

72 0 

3,375 

September ... 

3,162 

912 

3,467 

October 

3,760 

111.9 

3,360 

November . . 

3,828 

114 1 

3,355 

December 

3,615 

106 3 

3,401 

1930 




January . . 

2,505 

75 1 

3,336 

February 

3,024 

96 2 

3,143 

March 

3,416 

107 5 

3,178 

April 

3,877 

123.6 

3,137 

May 

3,639 

122 0 

2,983 

June 

3,354 

111 1 

3,019 

July 

2,451 

83 3 

2,942 

August 

2,057 

717 

2,869 

September ... . 

2,598 

87 7 

2,962 

October . . 

3,021 

107.9 

2,800 

November 

3,042 

110.5 

2,753 

December 

2,820 

103.3 

2,730 

1931 




January 

2,001 

75.1 

2,664 

February 

2,539 

96.2 

2,639 

March 

2,762 

107 5 

2,569 

April 

3,026 

123 6 

2,448 

May 

2,971 

122 0 

2,435 

}une 

2,732 

111.1 

2,459 

July 

1,998 

83.3 

2,399 

August 

1,713 

71.7 

2,389 

September i 

2,069 

87.7 

2,359 

October i 

2,480 

107 9 

2,298 

November. 

2,444 

110.5 

i 2,212 

December i 

2,170 

103.3 

2,101 

1932 




January 

1,570 

75.1 

2,091 

February 

2,000 

96.2 

2,079 

March 

2,184 

107.5 

2,032 

April 

May 

2,348 

123.6 

1,900 

2,278 i 

122 0 

1,867 

June 

1,903 

111 1 

1,713 

July 

1,394 

83.3 

1,673 

August 

1,173 

717 

1,636 

September 

1,310 

87.7 

1,494 

October 

1,607 

107.9 

1,489 

November .... . . 

1,754 

110 5 

1,587 

December 

1,641 

103 3 

1,589 
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TABLE 116 (Continued) 

Elimination of Seasonal Variations from United States Magazine Advertising. 

1921-~1937 

(Seasonal indexes were computed by 12-month moving average method ) 


Year and 
month 
(1) 

Original 

data 

(2) 

Seasonal 

mdex 

(3) 

Deseasonalized data 
[Col 2 Col 3] 
(4) 

1933 




January 

1,116 

75 1 

1,486 

February 

1,490 

96 2 

1,549 

March . .... 

1,630 

107 5 

1,516 

April . . . 

1,729 

123 6 

1,399 

May . 

1,732 

122 0 

1,420 

June . 

1,544 

111.1 

1,390 

July 

1,272 

83.3 

1,627 

August . 

1,184 

71.7 

1,651 

September . . 

1,407 

87 7 

1,604 

October 

1,870 

107.9 

1,733 

November . . 

1,899 

110 5 

1,719 

December 

1,791 

103 3 

1,734 

1934 




January 

1,375 

75 1 

1,831 

February 

1,765 

96 2 

1,835 

March 

2,013 

107 5 

1,873 

April 

2,469 

123 6 

1,998 

May 

2,501 

122 0 

2,050 

June 

2,271 

111 1 

2,044 

July 

1,863 

83 3 

2,224 

August 

1,534 

717 

2,139 

September . . . 

1,827 

87.7 

2,083 

October 

2,264 

107.9 

2,098 

November 

2,317 

110.5 

2,097 

December 

2,136 

103.3 

2,068 

1935 




January 

1,581 

75.1 

2,105 

February 

2,014 

96.2 

2,094 

March 

2,276 

107.5 

2,117 

April 

2,700 

123 6 

2,184 

May ... . . 

2,618 

122.0 

2,146 

June 

2,335 

111,1 

2,102 

July 

1,831 

83 3 

2,198 

August 

1,497 

717 

2,088 

September 

1,812 

87.7 

2,066 

October 

2,181 

107.9 

2,021 

November 

2,201 

110 5 

1,992 

December . , 

2,334 

103 3 

2,259 

1936 




January 

1,696 

751 

2,258 

February 

2,128 

96 2 

2,212 

March 

2,511 

107.5 

2,336 

April 

2,860 

123.6 

2,314 

May 

2,852 

122 0 

2,338 

June. 

2,637 

111.1 

2,374 

July.., 

1,967 

83.3 

2,361 

August 

1,695 

71.7 

2,364 

September 

2,084 

87.7 

2,376 

October 

2,637 

107 9 

2,444 

November 

2,736 

110.5 

2,476 

December 

2,731 

103.3 

2,644 
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TABLE 116 (Continued) 

Elimination of Seasonal Vaeiations from United States Magazine Advertising, 

1921-1937 

(Seasonal indexes were computed by 12-montb moving average method ) 


Year and 
month 
(1) 

Original 

data 

(2) 

Seasonal 

index 

(3) 

Deseasonalized data 
[Col. 2 Col. 3] 
(4) 

1937 




January 

2,031 

751 

2,704 

February ... . 

2,399 

96 2 

2,494 

March 

2,762 

107.5 

2,569 

April 

3,206 

123 6 

2,594 

May 

3,258 

122.0 

2,670 

June 

3,023 

1111 

2,721 

July 

2,235 

83 3 

2,683 

August 

2,018 

717 

2,815 

September . . . 

2,383 

87 7 

2,717 

October 

2,852 

107 9 

2.643 

November 

2,989 

110.5 

2,705 

December 

2,893 

103.3 

2,801 


Source. Tables 109 and 111, and data of Chart 177B 


the seasonal index, as in Table 116. Note that two seasonal indexes have 
been used: one for the years before 1930, and a different one for 1930 
through 1937, both of which were obtained by the 12-inonth moving aver- 
age method. The results are shown in Chart 183. The deseasonalized 
data contain three elements: trend, cyclical movements, and irregular 
movements. Sometimes it is desirable at this stage of the analysis to 
smooth out the irregular variations from the deseasonalized data. A 
consideration of the technique for accomplishing this is reserved for 
Chapter XIX. 

Test of seasonal. A comparison of the percentages of 12-month moving 
average with the seasonal indexes as plotted on Chart 175B indicates the 
closeness of agreement between the two. The same data are arranged 
differently in sections A and B of Chart 177. The closeness of the dots 
and circles to the seasonal index line in these charts constitutes a similar 
test of the adequacy of the seasonal index. Thus it is apparent that the 
fit in section A of Chart 177 is excellent, and much better than that in 
section B. It is also possible to measure the reliability of the different 
seasonal indexes, ascertaining the significance of their deviations from 100 
per cent.^ Again, it may be desired to test whether the seasonal index 
numbers used for the later period are significantly different from those 
used for the earlier period. Furthermore, it might be important to know 
whether the index number for a given month is significantly different from 

® This is sometimes tested by means of analysis of variance (see Chapter XIII) and 
by means of the correlation ratio (see Chapter XXlll). These tests are subject to 
the limitations mentioned in this section. 



498 


PERIODIC MOVEMENTS 


[Chaf 17 


that for some other month or from some other 
value for the same month. The application 
of the theory of sampling to seasonal index 
numbers involves theoretical difficulties, how- 
ever, that have not been fully overcome. 
These difficulties arise because: (1) modified 
means, rather than means of all the data, are 
ordinarily used in constructing seasonal in- 
dexes; (2) the distributions from which these 
means are computed do not represent random 
distributions, since the irregular movements 
of a time series are not random occurrences. 

A very practical test is to see whether the 
use of the seasonal index adopted appears to 
eliminate all of the seasonal movement. The 
data of Chart 183 have been rearranged in 
Chart 184 in a manner similar to Chart 176. 
Close inspection of this chart reveals a slight 
similarity in the pattern for the years 1921, 
1922, 1923, and 1924. For instance, in each 
of these years February is lower than January. 
The patterns from 1925 through 1929 also ex- 
hibit a faint family resemblance. These facts 
indicate that the indexes adopted do not com- 
pletely eliminate the seasonal movement. 
Perhaps it would have been better to have 
further subdivided the data into periods of 
time with a separate index for each; though if 
subdivided too finely, not enough years would 
be available in each subperiod to provide a 
reliable seasonal index. It might have been 
better to have constructed a moving seasonal, 
the procedure for which will be explained in 
the following chapter. It should not be con- 
cluded, however, that the seasonal indexes 
selected are poor generalis:ations. Though the 
indexes are not perfect, the different tests in- 
dicate that they are very satisfactory. 

Selected References 

E, C, Bratt: Btmness Cycles and Forecasting j 
Chapter II; Business Publications, Inc., 
Chicago, 1937. Mainly a consideration of 
^onoinic factors 



Chart 184. Deseasonalized 
United States Magazine Adver- 
tising Data, 1921-1936. (For 
purposes of comparison the dif- 
ferent curves have been plotted 
close together on the same chart 
instead of on separate charts. 
Each curve is plotted to the same 
vertical scale, but at a different 
level. This arrangement is some- 
times referred to as a multiple 
axis chart. Data of Table 116.) 
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R. W. Burgess: Introduction to the Mathematics of Statistics , Chapter IX; Houghton 
Mifflin Co , Boston, 1927. Classifies and describes a number of methods of 
computing seasonal indexes. 

R. E. Chaddock: Principles and Methods of Statistics, pages 337-353; Houghton 
Mifflin Co., Boston, 1925. 

E. E. Day Statistical Analysis, Chapter XVIII; Macmillan Co., New York, 1927. 

Explains the link-relative method in detail. 

F. E. Croxton and D. J. Cowden* Practical Business Statistics, Chapter XIV; 

Prentice-Hall, Inc., New York, 1934. Includes illustration of use of a short 
cut in the moving average method. 

J. R. Stockton An Introduction to Business Statistics, Chapter IX; D. C Heath 
and Co., Boston, 1938. 



CHAPTER XVIII 


TYPES OF SEASONAL MOVEMENTS 


In this chapter will be described methods of isolating some additional 
types of seasonal movements and further refinements of methodology. Of 
course, not all time series require the specialized treatment here outlined. 
The methods involved are somewhat more complex than those previously 
described, although the mathematics involved is entirely elementary. 

Progressive Changes in Seasonal Pattern 

Use of moving averages. The most satisfactory method of computing 
a seasonal index that is gradually changing in pattern, frequently called a 
moving seasonal, is based upon the fitting of curves to percentages of a 
moving average. Usually a 12-month moving average is used, the initial 
step therefore being the same as the most highly recommended method 
described in Chapter XVII. This will be the method used here.^ 

^ A difficulty of a 12-inonth movmg average was found in Chapter XVII to be that 
it smoothed out not only seasonal movements but part of the cycle as well. The 
percentages of 12-month movmg aver- 
age, therefore, included not only seasonal 
and irregular movements, but some of 
the cycle also. Furthermore, a 12-month 
moving average is not so smooth as might 
be desired. If extreme accuracy is de- 
sired, a much more laborious moving 
average is sometimes recommended. In 
Chapter XVI it was explained that 
weighted moving averages could be used 
for trends; and a curve, at the same time 
smoother and more flexible, could be ob- 
tained if the weight pattern selected were 
a smooth one. Macaulay recommends 
for the study of seasonal movements a 
considerably more complex weight sys- 
tem, involving 43 months, the diagram 
of which is shown herewith. It is stated 
that this moving average follows with 
almost complete accuracy symmetrical 
cyclical movements of from 30 to 120 
months’ duration. Actually, however, 
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Weight Pattern of a 43-Term Moving Av- 
erage. (Formula taken from Frederick E. 
Macaulay, The Smoothing of Time Senes, p 
148, National Bureau of Economic Ee- 
search, New York, 1931.) 
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Computation, of moving seasonal. As indicated above, the first step 
may be the computation of a 12-month moving average. Department 
store sales and moving average are plotted in Chart 185. The appear- 
ance of the chart indicates that, as compared with the seasonal fluctuations, 
the cyclical movements are relatively unimportant. A 12-month average 
is therefore sufficiently accurate for satisfactory results. The moving aver- 
age has been extended freehand for 6 months in either direction to the 
edge of the chart, in order not to lose any of the original data. These 
extensions are shown by dotted lines. Next, the original data are divided 
by the moving average figures and recorded as percentages. 

The method of deriving the moving seasonal is, from this point on, 
essentially graphic. On a large piece of graph paper which has been di- 
vided into twelve sections, one for each month, are plotted the percentage 
data. Thus, in the January rectangle of Chart 186, the January percent- 
ages for each year are plotted as shown by the thin line. In the first six 
sections, 1919 is connected with 1920 by a broken line to remind the com- 
puter that the 1919 figure is based on estimated data. Similarly, in the 
last six sections, 1935 is connected with 1936 by a broken line for the 
same reason. Also, there is a broken line connecting 1933 with 1932 and 
1934 for January, February, and March. (This may be seen clearly in 
only the March section.) This line is broken because the percentage of 
moving average figure for those months was raised in an attempt to mini- 
mize the effect of the bank holidays during those months. 

Next, each of the twelve curves of Chart 186 is smoothed by means of 
a 5-term moving average. In this chart the moving average data are 
represented by crosses. The object of the moving average is merely to 
aid in the location of the first approximation line, which was obtained by 
inspection and is shown by means of large dots. (Instead of freehand 
approximations, curves may be fitted mathematically to the percentages 
of moving average.) These dots are extended beyond the data as smoothed 
by the moving average, so as to include each year from 1919 through 1937. 
This extension provides us with a forecast for 1937 to be used in deseasonal- 
izing that yearns data. When drawing in this line, the statistician should 

cyclical movements are not symmetrical and the use of this method may lead to un- 
reasonable results. For further discussion, see Frederick R. Macaulay, The Smoothing 
of Time Series, National Bureau of Economic Research, New York, 1931, On page 
14S is given the weight pattern. See page 159 for its degree of coiiformity to a sine 
curve. The formula for computing this moving average is given on page 73: ^‘Take 
a 5-months moving total of a 5-months moving total of an 8-months moving total 
of a 12-months moving total of the data. To the results apply the following ex- 
tremely simple weights: 7, —10, 0, 0, 0, 0, 0, 0, +10, 0, 0, 0, 0, 0, 0, -10, +7. Divide 
the final results by 9600.’^ In Chapter VIl Macaulay gives criteria for judging weight 
formulae, and in Chapter JV discusses a number of individual formulae, some of which 
he finds satisfactory 




1919 1920 1921 i922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 

Chart 185. Federal Reserve Index of Department Store Sales, 1919-1936, and 12-Month Moving Average. (For source of data see Chart 108.) 
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dra^ relatively simple curves, and also attempt to be fairly conservative 
at the end years. ''Conservatism’' means here that the curves should, 
wherever it seems at all reasonable, have a tendency to flatten out rather 
than incline more steeply at the ends. For it must be remembered that 
these trends are not affected by the same factors associated with secular 
trend; they will not continue in a given direction indefinitely, but are more 
likely to move to a certain level and remain stabilized until new factors 
bring about a change in one direction or another. An additional reason 
for conservatism is that the first and last 6 months are at least partly 
estimated and not entirely trustworthy. 

Annual values for each month are now read from the dots and entered 
in a table such as Table 117. The figures in the columns are not the final 
seasonal indexes, for each year does not total 1200 per cent. The statis- 
tician must now go back to his chart, and draw new lines, keeping three 
purposes in view: (1) make each year total 1200; (2) keep the curve in 
each rectangle smooth; (3) obtain final curves which fit the data. This is, 
of course, a highly subjective procedure and should not be undertaken by 
one lacking experience or not familiar with the series. Final results are 
shown in Table 118 and by the heavy lines of Chart 186. 

If the reader will now turn to Chart 187, he will see the moving seasonal 
pattern for this series as shown by the dotted line. The changes in the 
seasonal pattern are so gradual that one is tempted to think that they are 
insignificant. However, if Chart 188 is referred to, it is apparent that 
there is a considerable difference between the 1919 seasonal pattern and 
the projected pattern for 1937. The more recent year shows considerably 
more amplitude of fluctuation, but most noticeable is the great increase 
in the relative importance of the Christmas trade. 

The procedure for computing a moving seasonal may be summarized 
briefly as follows:^ 

(1) Compute 12-month moving averages of original data. 

(2) Divide original data by these moving average figures and express 
as percentages. 

(3) Plot these percentages each month, by years. 

(4) Smooth with a 5-term (or other) moving average. 

(5) Draw a freehand trend line and read values from it. 

(6) Adjust these first approximation figures to total 1200 for each 
year, at the same time retaining a smooth, well-fitting trend. 


2 This procedure is taken from ^‘Use of Moving Averages in the Measurement of 
Seasonal Variations,” by Aryness Joy and Woodhef Thomas, Joumol of the American 
Statistical Association, Vol. XXIII, September 1928, pp. 247-249. 
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If a series that contains a moving seasonal is deseasonalized by a stable 
seasonal index, the adjusted data will contain not only the ordinary type 
of irregular movements but additional irregularities where the stable sea- 
sonal index has under-corrected for seasonal in some places and over- 
corrected in others. If the series has been adjusted by a moving seasonal, 
the resulting series should be somewhat smoother. That this seems to be 
the case with department store sales can be seen by a comparison of sec- 
tions A and B of Chart 191. Of course, it is true that, if the moving 


PER CENT 
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Chart 188 . Seasonal Pattern of Department Store Sales for 1919 and 1937 . (Data of 

Table 118.) 

seasonal has been made too flexible, it may be merely a combination of a 
stable seasonal and smoothed random fluctuations, and deseasonalizing 
by such an index may remove not only the seasonal movements but some 
of the fluctuations of an irregular character as well. In the present in- 
stance the smoothness of the freehand curves of Chart 186 and the close- 
ness of their fit to the percentages of moving average make this limitation 
seem unimportant. Furthermore, if the data are eventually to be par- 
tially smoothed of irregular movements, this preliminary smoothing prob* 
ably does no great harm. 
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Sudden Variations in Seasonal Pattern 

Seasonal patterns may change abruptly, rather than gradually, and then 
the device of a moving seasonal would be inapplicable. Such changes may 
involve merely the relative importance of two consecutive months, or may 
involve a change in the entire pattern. The most obvious change of the 
first type is that occasioned by the varying date of Easter, which may 
range from March 22 to April 25. 

Adjustment for Easter. A number of industries are affected materially 
by Easter. Department store sales are one type of series which we should 
expect to be affected. A late Easter will tend to make April sales heavy 
relative to March, and, within limits, the later in April that Easter occurs, 
the greater is this tendency. On the other hand, when Easter occurs in 
March, March sales, and possibly February sales, will be increased. 

A procedure for making the Easter adjustment is as follows:^ 

1. Express the original March and April values each as percentages oj 
lB-‘month moving average. This has already been done if the seasonal has 
been computed by the method advocated in this book. If a weighted 
moving average has been computed — ^for instance, in computing a moving 
seasonal — that may be used instead of the 12-month average. These per- 
centages are, of course, estimates of seasonal-irregular movements. 

2. Subtract the March seasonal index numbers from these March percen- 
tages, and the April index numbers from the April percentages. If a moving 
seasonal has been computed, there wiU be a different number to subtract 
each year. See columns 4 and 7 of Table 119. These are called March 
and April residuals respectively, since they are variations remahiing after 
taking the normal seasonal movement into account. They are due in 
part to irregular movements, but also to the fact that Easter sometimes 
occurs early and sometimes late. 

3. Subtract the March residuals from the April residuals as in column S 
of Table 119. These differences between the April and March residuals 
are presumably due in large part to the varying date of Easter. Let us 
call these differences Easter residuals. 

4. Next, it must be discovered whether or not these second residuals 
actually do vary in accordance with the date of Easter. In Chart 189 
arc plotted, by years, these Easter residuals and the date of Easter (data 
are from Table 120). It is apparent that there is a marked tendency fox 
early Easter to increase March sales relative to April, and for late Easter 

^ This procedure is based on an article by Leroy M. Piser, “The Adjtistment of Time 
Data for the Influence of Easter/^ Journal of the Amencan Staiisticd Association, Vol 
XXIX, June 1934, pp. 190-*i91. Piser, however, does not include step 2 in his com- 
putations. 
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to stimulate April sales. This tendency was less perfect during some of 
the general depression years, especially 1932 and 1935. But this chart 

DATE OF EASTER 

EASTER RESIDUAL 



Chart 189, Date of Easter and Easter Residuals for Department Store Sales, 1919- 
1936. (Data of Table 120 ) 

does not tell us how much, on the average^ April sales are increased over 
those of March for each additional day later that Easter occurs. Such an 
estimate can he obtained if the residuals are plotted, not hy years, but with 


TABLE 120 

Date op Easter, Easter Residuals, and Easter Correction 
Eactor, Department Store Sales Data, 1919-1936 


Year 

Date of Easter 

Easter 

residuals 

Month 

Day 

1919 

April 

20 

8.7 

1920 

April 

4 


1921 

March 

27 

- 7.8 

1922 

April 

16 

6.4 

1923 

April 

1 

- 51 

1924 

April 

20 

6.3 

1925 

April 

12 

1.8 

1926 

April 

4 

- 4.9 

1927 

April 

17 

4.7 

1928 

Apr0 

8 

- 1.3 

1929 

March 

31 

-11.9 

1930 

April 

20 


1931 

April 

5 

1,3 

1932 

March 

27 

- 2.8 

1933 

April 1 

16 

10.8 

1934 

April 

1 

- 9.7 

1935 

April 

21 

1.2 

1936 

1 

April 

12 

A 


Source: Easter r^iduals from Table 119. Date of Easter from New Yorl: 
World Telegram, The World Almanac and Book of Tacts, 1334, p. 66 
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Master date along the horizontal axis and trend line fitted to the data as 'plotted- 
The slanting line of Chart 190, which was fitted by inspection, suggests that 
a change in one day in the date of Easter is responsible for an increase of 
about .75 per cent in April sales over those of March. The estimating 
line may be fitted mathematically if desired. However, it seems unrea- 


EASTER 

RESIDUAL 



Chart 190. Graphic Estimatioii of Easter Correction Factor for Department Store Sales, 
1919-1936. (1933 is subjectively adjusted for closmg of banks. Data of Table 120.) 

sonable to attach much weight to years such as 1933, for whose unusual 
behavior a definite reason can be assigned. 

It is to be noted that the estimating line is drawn horizontally through- 
out March. An Easter which occurs April 1 or earlier, has deprived April 
of any Easter sales whatever; no more damage can be done regardless of how 
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early in March Easter occurs. It is, of course, possible, if people shop 
very early, that an Easter occurring early in March may increase February 
sales relative to March. It is not often, however, that the statistician 
will consider it worth while to make that adjustment, 

5. Next, read off the correction factor for each date of Easter from the 
trend line. The correction factor for all possible dates of Easter is given 
in column 2 of Table 121. 


TABLE 121 

Easter Correction Factors for Department Store 
Sales Data 


Date of 
Easter 

(1) 

Gross correction 
factor 

(2) 

Net correction 
factor applicable 
to each month 
[Col 2 -r- 2] 

(3) 

March 

-7.0 

-3 5 

April: 



1 

-7.0 

-3.5 

2 

-6 2 

-3.1 

3 

-5 4 

-2 7 

4 

-4.6 

-2.3 

5 

-3 8 

-19 

6 

-3.1 

-16 

7 

-2.3 

-12 

8 

-1.5 

- 8 

9 

- .7 

- .4 

10 

.1 

0 

11 

.9 

4 

12 

1.7 

.8 

13 

25 

i 1.2 

14 

3,3 

1.6 

15 

41 

2.0 

16 i 

4.8 

24 

17 

56 

2.8 

18 

6.4 ! 

32 

19 

7.2 ^ 

36 

20 

8.0 

40 

21 

8.8 

4.4 

22 

96 

48 

23 

10.4 

5.2 

24 

11.2 

56 

25 

11.9 

60 


* These vahie*« are to be added algebraically to April and sub- 
tracted algebraical l.v from M'lrch 
Source: Read from Cnart 190 


6. Now, divide the correction factor by two, since what April sales gain 
by a late Easter, March sales lose, and vice versa* See Table 121, col- 
umn 3. 
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7. Finally, add this amount algebraically to the April seasonal index num- 
bers and subtract it algebraically from the March numbers, as in Table 122. 


TABLE 122 

Adjustment of March and April Seasonal Index Numbers for Variation in 
Date op Easter, Department Store Sales Data, 1919-1937 


Year 

(1) 

Date 

of 

Easter 

(2) 

Net 

correction 

factor* 

(3) 

( March seasonal 

April seasonal 

1 

Uncor- 

rected 

(4) 

Corrected 
[Col 4 - 
Col 3] 

(5) 

Uncor- 

rected 

(6) 

Corrected 
[Col 6 + 
Coi 3] 

(7) 

1919 

April 20 

40 

95 0 

91 0 

1015 

105 5 

1920 

April 4 

-2 3 

94.5 

96 8 

1015 

99.2 

1921 

March 27 

-3.5 

94.0 

97 5 

101.5 

98 0 

1922 

April 16 

-2.4 

93 5 

95 9 

101.5 

99 1 

1923 

i April 1 

-3 5 

93.0 

96 5 

101.5 

98 0 

1924 

April 20 

40 

92 5 

88.5 

1015 

105 5 

1925 

April 12 

8 

92.0 

912 

1015 

102.3 

1926 

April 4 

-2 3 

915 

93 8 

1010 

98.7 

1927 

April 17 

2.8 

91.5 

88.7 

100 0 

102 8 

1928 

April 8 

- 8 

91.5 

92.3 

99.5 

98.7 

1929 

March 31 

-3 5 

91.5 

95.0 

99 5 

96 0 

1930 

April 20 

40 

91.5 

87.5 

100 5 

104 5 

1931 

April 5 

-1.9 

91.5 

93 4 1 

1010 

99 1 

1932 

March 27 

-3.5 

91.5 

95 0 1 

101.5 

98 0 

1933 

April 16 

2.4 

92 0 

89 6 1 

1015 

103.9 

1934 

April 1 

-3.5 

92 5 

96 0 ! 

1015 

98 0 

1935 

April 21 

4,4 

93.0 

88.6 

101.5 

105.9 

1936 

April 12 

.8 

93.5 

92.7 

101.5 

102,3 

1937 

March 28 

-3 5 

940 

97.5 

101.5 

98 0 


* To be added algebraically to April and subtracted algebraically from March. 
Source Tables 118 and 121. 


Seasonal is now removed in the usual way, dividing the original data 
by the revised seasonal index. If a series is deseasonalized by a seasonal 
index that takes no account of the effect of the varying date of Easter, 
the adjusted data will contain not only the ordinary t 3 rpe of irregular 
movements, but additional irregularities which are due to the varying date 
of Easter. If the seasonal index which is used takes account of the vary- 
ing date of Easter, the deseasonalized data will therefore be smoother. It 
is possible that accidental variations between March and April may mis- 
takenly be attributed to variation in the date of Easter, and the correc- 
tion for Easter may, in fact, become merely a roundabout method of 
smoothing out such irregular movements. The closeness of the dots to 
the slanting line of Chart 190 makes it seem highly improbable that this 
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qualification is of much importance in the case of department store sales. 
Bearing this qualification in mind, however, and the similar one mentioned 
on page 508 concerning the moving seasonal, it should be observed that 
the deseasonalized data shown in the three parts of Chart 191 become 
progressively smoother as adjustment is made by progressively more re^ 
fined seasonal indexes. 





Chart 191. Department Store Sales and Different Adjustments for Seasonal Vaxia’' 
tion, 1919-1937; A. Adjustment for Stable Seasonal, B. Adjustment for Moving Sea- 
sonal, C. Adjustment for Moving Seasonal and Varying Date of Easter. (Computed 
from original data of Chart 185.) 
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Sudden changes in entire seasonal pattern. In Chapter XVII our study 
of the seasonal behavior of United States magazine advertising led to the 
conclusion that there was a material change in the seasonal pattern after 
1929. The remedy adopted was to break the whole period into two sub- 
periods: one for the years before 1930, and the other from 1930 on. Per- 
haps an even more clear-cut illustration is that of automobile production. 
Before 1935 it was customary to hold the New York automobile show in 
January. In 1935, however, a show was held in November (as well as in 
the preceding January); and since 1935 there has been but one show a 
year, that being in early November. The result has been two seasonal 
lows and two seasonal highs a year, instead of one. Whereas previously 
the low was in the fall and the high in the spring, a few months after the 
show, now there is one low just preceding and another just after the show, 
and a high coinciding closely with the show as. well as one in the spring, 
about April. Consequently it seems advisable to compute two seasonal 
indexes rather than one. In part A of Chart 192 are shown the original 
data and the data as deseasonalized by a stable seasonal, while part B of 
the chart shows the data deseasonalized by two seasonal indexes, one run- 
ning from April 1930 through March 1935, and the other from April 1935 
through March 1937. The reason that a year from April through March, 
rather than a calendar year, was taken is that the automobile show did 
not change the fundamental pattern, which is a spring high and a fall low, 
but merely added another irregularity during the winter. By breaking 
the year between March and April, the problem of how to handle the 
calendar year 1935 was solved. Of course, the seasonal index for the last 
two 12-month periods is not very reliable, being an average of only two 
items Further discussion of these indexes will be given on page 527. 

Short-time shifts in timing. The var3dng date of Easter affects mate- 
rially only March and April, and the automobile show affects chiefly a few 
months preceding and following it. Weather conditions, however, which 
also vary from year to year, may result in early harvests one year and late 
harvests the next; and not only may the marketing of the product begin 
at different times in different years, but the flow of goods during the entire 
year may be affected, the effect being to shift the whole pattern a few 
months to the left or right. Likewise, consumer demand may vary in 
liming, depending on how early the weather changes. 

Such shifting seasonal patterns present a difficult problem. Perhaps 
the most practical solution is to regard the problem as a special case of a 
sudden change in entire pattern, to group together the years (not neces- 
sarily adjacent) which show the same timing in their seasonal turns, and 
to compute as many seasonal indexes as there are groups of years. In 
computing such indexes, there is no reason why the calendar year must be 



1930 193 » 1932 1933 1934 1935 1936 1937 

A 



Chart 192. United States Passei^er Car Production as Beseasonalized (A) by Single 
Index and (B) by Separate Indexes for Two Sub-periods, April 1930-March 1937. 
(Original data were compiled by the United States Bureau of the Census in Cooperation 
with the Automobile Manufacturers Association, and pubhshed in U. S. Department 
of Commerce, Survey of Currmi Business, 1932 Supphnwit, pp. 274r-275, 1936 Annml 
Supplement, p. 147, March 1937, p. 55.) 
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taken as a unit. Rather, if the subject matter has to do with agriculture, 
the year should be related to the crop year. Perhaps the central month 
should be the seasonal high or the seasonal low. 

Varying amplitude. Some economic series retain more or less the same 
general seasonal pattern from year to year, but have a tendency to vary 
rather suddenly in amplitude. This is particularly true of stocks of agri- 
cultural commodities. For example, stocks of agricultural crops show 
varying seasonal amplitude from year to year depending upon the amount 
carried over from the preceding year, the size of the harvest, and the 
amount consumed. Likewise, shipments of livestock are likely to vary 
in the amplitude of their seasonal swing. Here the variation may have 
something to do with the advantage of immediately selling the livestock, 
as compared with holding them for further fattening or a price increase. 
Since the relative advantages of these policies, as explained oq page 156, 
is likely to vary in cycles, so the amplitude of the seasonal variation is 
likely to change in cycles, and the change in pattern might conceivably 
be treated as a moving seasonal. Another borderline case is that of in- 
creased seasonal amplitude in manufacturing, brought about by a general 
cyclical tendency toward hand-to-mouth buying. It is apparent that this 
change also might be thought of as a moving seasonal, the progression 
being cyclical rather than trend-hke. 

It must be apparent that, when the amplitude is not changing gradually 
but changing suddenly, and in the main unpredictably, a moving seasonal 
cannot overcome this problem any better than it can that of short-time 
shifts in entire pattern or in timing. Any of the types o^ seasonal hitherto 
described would in some years over-correct the data and in other years 
under-correct it. The object of the following procedure then is; (1) to 
discover for each year whether the seasonal amplitude is greater or less 
than average (more specifically, to measure how much the actual seasonal 
varies with a given change in the seasonal index); (2) to adjust the sea^ 
sonal index so that it will have the correct amplitude. The procedure^ is 
somewhat analogous to the Easter adjustment. Receipts of sheep and 
lambs at primary markets are taken as an illustration. 

1. Express the seasonal index for each year as deviations from 100. See 
column 3, Table 123. 

2. Express the original data as percentages of 12’month moving average. 
A weighted moving average may be substituted for the 12-month moving 
average if preferred. In the present illustration a 12-month moving aver-. 


*This procedure is based upon an article by Simon S. Ruznets, “Seasonal Pattern 
and Seasonal Amplitude; Measurement of their Short-Term Variations,” Journal oj 
the American Statistical Association^ VoL XXVII. March 1932. pp. 9-20. 
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age was smoothed slightly by inspectioD . The percentages are in column 4 
of Table 123. 

TABLE 123 


Seasonal Index and Peecentage Deviations from 12-month Moving Average 
Adjusted to Average Zero, for Receipts op Sheep and Lambs, 1934 


' 1 

1 

Month 

(1) ^ 

Seasonal index 

Percentage of 12-month nio\ mg average 

Percentage 

(2) 1 

Percentage 
deviation 
[Col 2 - 100] 
(3) 

Uncorrected 

percentage 

(4) 

Corrected 
percentage 
[Col 4 X c]** 

(5) 

Percentage 
deviation 
[Col 5 - 100] 
(6) 

January 

84 

-16 

85.2 

86 

-14 

February ' 

75 

-25 

68 0 

68 

-32 

March 

80 

-20 

73 0 

74 

-26 

April 

93 

- 7 

85 1 

86 

-14 

May 

98 

- 2 

97 4 

98 

- 2 

June 

92 

- 8 

82 8 

83 

-17 

July 

94 

- 6 

97 8 

98 

- 2 

August j 

119 

19 

118.6 

119 

19 

September 

143 

43 

149 7 

151 

51 

October . 

150 

50 

182 3 

184 

84 

November 

95 

- 5 

82 4 

83 

-17 

December . 

76 

-24 

69 5 

70 

-30 

Total 

1199 

i 

- 1 

11918 

1200 

0 

Average 

100 

0 


100 

0 


* c = 1200 0 ^ 1191 8 » 1 00688 

Source Original data from United States Department of Commerce, Survey of Current Business, 1932 
Annual Supplement, p 165; 1936 Supplement, p 96, March 1937, p 43. 


3. Adjust these 'percentages so that they will total {and therefore average) 
zero, algebraically, for each year. This may be done by averaging alge- 
braically the percentages for each year, and then subtracting these aver- 
ages from each of the items for the corresponding year; or the percentages 
may be multiplied in the usual fashion by a correction factor so that they 
will total 1200, and then 100 may be subtracted from each corrected per- 
centage. The latter procedure is followed here, and is shown in columns 5 
and 6 of Table 123. 

4. A comparison of these two sets of data will tell us which has the 
greater amplitude, the seasonal index (column 3) or the percentages of 
moving average (column 6). From Chart 193 it is apparent that in 1934 
the latter varied more, while in 1932 the seasonal index had the greater 
amplitude. The 1934 seasonal index would, therefore, fail to remove all 
of the seasonal, while in 1932 the seasonal index would over-correct the 
data. (Incidentally it might be noted that, except for amplitude, the 
variations conform rather closely to the seasonal index in both of these 
years.) 



JAN FEB MAR. APR MAY JUN JUL AUG SEP OCT NOV DEC 

Chart 193. Stable Seasonal Index and Percentage Deviations from 12-Month Mov- 
ing Average of Receipts of Sheep and Lambs, 1932 and 1934. (Stable seasonal is indi- 
rated by broken line; percentage deviations by solid line. For source of original data 
see Table 123. 1934 data are from Table 123.) 
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A rearrangement of the material of Chart 193 will yield more useful 
information. In this chart the deviations were arranged chronologically 
(by months) and connected by straight lines. If the horizontal scale is 
used to represent the values of the seasonal index, or x, and the vertical 
axis to represent deviations from moving average, or as in Chart 194, 
we can estimate about how much, on the average, the percentage deviations 
vary as the seasonal index numbers vary. The broken diagonal straight line 
of the 1934 chart informs us that, for every increase of 1 per cent in x (the 
seasonal index), the y values change (in the same direction) about 1.4 


PER CEUT 1932 PER CENT 1934 



Chart 194. Seasonal Index as Percentage Deviations (on Horizontal Scale) and 
Percentage Deviations from 12-Month Moving Average (on Vertical Scale) together 
with Seasonal Amplitude Correction Line, for Receipts of Sheep and Lambs, 1932 and 
1934. (For source of original data see Table 123. 1934 data are from Table 123.) 

per cent. The solid line slanting at an angle of 45 degrees, with the equa- 
tion y = X, represents the relationship that would exist between the vari- 
ables if the amplitude of the seasonal index and the deviations from moving 
average were the same. Apparently the actual seasonal amplitude was 
greater than average in 1934 (since the broken line has a greater slope 
than the solid one) ; but in 1932 it was apparently slightly less than normal. 
These lines of relationship can be fitted by inspection (as were the Easter 
adjustment lines), or mathematically by the general method of fitting trend 
lines described on pages 395-408. The mathematical method for 1934 is 
shown in Table 124. Since both sets of data were made to average zero. 
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the estimating line passes through the point X = 0, F= 0, and a in the usual 
straight line equation becomes zero. The equation for 1932 is y = .89a:, 
while that for 1934 is y - 1.39a:. Note that the values of 6 for the differ- 
ent years may be called amplitude ratios. (Again it might parenthetically 
be noticed that the dots are fairly close to the line, suggesting that most 
of the difference between the two series is explained by a difference in 
amplitude.) 


TABLE 124 

COMPWTATIOK OP AMPLITUDE RaTIO, AND COEBBCTION OP SEASONAL INDEX POE AM- 
PLITUDE, POE Receipts op Sheep and Lambs, 1934 


Month 

Seasonal 

index 

X 

Deviations 

from 

moving 

average 

y 

xy 

X2 

2/0 = 1 3Qx 

Corrected 
seasonal 
100 + ya 

January . 

-16 

-14 

224 

256 

-22 

78 

February . 

-25 

-32 

800 

625 

-35 

65 

March 

-20 

-26 

520 

400 

-28 

72 

April 

- 7 

-14 

98 

49 

-10 

90 

May 

- 2 

- 2 

4 

4 

- 3 1 

97 

June . 

- 8 

-17 

136 

64 

-11 

89 

July .. . . 

- 6 

- 2 

12 

36 

- 8 

92 

August 

19 

19 

361 

361 

26 

126 

September 

43 

51 

2,193 

1,849 

60 

160 

October . 

50 

84 

4,200 

2,500 

70 

170 

November 

- 5 

-17 

85 

25 

- 7 

93 

December 

-24 

-30 

720 

576 

-33 

67 

Total . , . 

- 1 

0 

9,353 

6,745 

- 1 

1,199 


Source, Table 123. 


^ ^ 9,353 

^ " 2x2 “ 6,745 
Equation: yc = 1.39x. 


1.39. 


5. In order to correct our seasonal index for amplitude variations, we 
must substitute the different values of a; in our equation; that is, multiply 
each of our seasonal indexes (taken each year as deviations from 100) by 
the appropriate amplitude ratio. (The amplitude ratio will usually be dif- 
ferent for each year.) This is done in the yc column of Table 124. In 
1934 plainly the effect of this procedure is to increase the amplitude of 
the seasonal index. In 1932 the opposite result is obtained. Finally we 
must convert these yc values into percentages averaging 100 by adding 100 to 
each number, as in the last column of Table 124, 

Although the procedure for 1934 only is here illustrated, each year must 
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be dealt with separately in similar fashion. After the corrected seasonal 
of step 5 is obtained, seasonal variation is eliminated in the usual manner, 
by dividing the original data by the corrected seasonal. The results oi 
these operations can be seen in the two parts of Chart 195. Note that 

MILLIONS 

Of ANIMALS 

4 5 

4 0 

3 5 

3 0 

2.5 

2,0 

1.5 

1928 1929 1930 1931 1932 1933 1934 1936 

A 

MILLIONS 

Of ANIMALS 

4 5 

4 0 

3 5 

3 0 

2 S 

2 0 

» 6 

1,0 

Chart 195. Receipts of Sheep and Lamhs as Deseasonalized (A) by Stable Index 
and (B) by Index of Varying Amplitude, 1928-1935. (For source of original data see 
Table 123.) 

adjustment by seasonal indexes of varying amplitude gives smoother re- 
sults than adjustment by a stable seasonal, 

A word of caution is in order. If a moving seasonal has been used, a 
change in the amplitude ratio does not necessarily indicate a change in 
the seasonal amplitude of the original data. A gradual increase in the 
seasonal amplitude, for instance, would be reflected in the moving seasonal 
index rather than in the amplitude ratio; but the moving seasonal would 
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fail to register any sudden departures from the general trend in amplitude 
change. 

Further Refinements of Method 

Continuity of seasonal indexes. A stable seasonal index averages 100 
per cent, not only for the 12-month period selected for the index, but for 
any consecutive 12-month period. The latter, however, is not true for 
any of the seasonals explained in this chapter, though in the case of a 
progressive or moving seasonal the discrepancy is nominal only. Par- 
ticularly in the case of seasonal indexes corrected for variations in ampli- 
tude, however, the discrepancy may assume alarming proportions. The 
difiSculty manifests itself in discontinuity of the seasonally adjusted data 
at the point where one year ends and the next begins. Let us assume, for 
instance, that the unadjusted seasonal index numbers for December 1930 
and January 1931 are each 80 per cent, the amplitude adjustment to be 
applied, let us say, to calendar years. Now, suppose further that the 
amplitude ratios are .5 and 1.5 respectively. This makes the adjusted 
December 1930 index number 40 per cent and the January 1931 number 
120. It is apparent that there will be an enormous drop in the seasonally 
adjusted data between December and January. Yet a little thought will 
convince one that the change in amplitude does not take place entirely 
in a month’s time, but represents a transition of several months’ duration. 

Although there is no entirely satisfactory solution for this diflS.culty, one 
remedy, which is very laborious, is to compute an amplitude ratio for 
each consecutive 12-month period of the entire series. For instance, if 
the data ran from 1926 through 1936, the first 12-month period would 
run from January 1926 through December 1926, the second from February 
1926 through January 1927, and so on. Altogether there would be 121 
such 12-month periods and the same number of amplitude ratios. We could 
speak of these ratios collectively as a moving amplitude ratio. Following 
the analogy of a 12-month moving average, these ratios should be centered 
by a 2-month moving average, leaving 120 amplitude ratios, running from 
July 1926 through June 1936. The seasonal index numbers are then mul 
tiplied by these amplitude ratios to obtain the final seasonal index numbers 

This procedure is laborious, but it is not entirely satisfactory. Although 
there is no sharp break in the continuity of the series, it has the defect 
that not any 12 consecutive seasonal index numbers are centered on 100 
per cent. A less accurate but also much less laborious procedure than the 
one just described is to compute an amplitude ratio for each standard 
year, center the ratio on the sixth or seventh month, and interpolate 
arithmetically from one year to the next. 
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Combinations of seasonal tj^es. It is frequently true that the seasonal 
variation may be gradually changing in pattern, shifting in its timing, and 
varying in amplitude, or some combination of the three. Thus, jBaxseed 
stocks, during the crop years from July 1923 through June 1936, had 2 
September peaks, 9 October peaks, and 2 November peaks, but also varied 
tremendously from year to year in amplitude of fluctuation, the 1927-1928 
amplitude ratio being 1,69 and that for 1929-1930 being .44. The pro- 
cedure for obtaining final seasonal indexes for these data would be: (1) 
break data into sub-periods according to occurrence of seasonal high; (2) 
compute stable seasonal for each such sub-period; (3) using these seasonal 
indexes, compute amplitude ratios for each year (possibly using the method 
of interpolation described above) ; (4) multiply the seasonal index numbers 
by the appropriate amplitude ratios. 

Other combinations of seasonal types require different treatment. Con- 
siderable ingenuity is frequently required to measure and eliminate sea- 
sonal variation successfully. Unfortunately, there is no way of telling 
when we have arrived at the best solution of the problem. Complexity 
of procedure does not guarantee that the results obtained accurately de- 
scribe the movement which we set out to measure. Particularly if the 
data are originally unreliable, great refinement of method is likely to be 
largely wasted effort. 

Correction by subtraction of seasonal. It occasionally happens that 
grotesque results are obtained when seasonal is eliminated by dividing by 
a seasonal index. This is especially likely to be the case when the seasonal 
movement typically falls almost to zero at one or more months. Then, 
if in any given year the series remains materially above zero for those 
months, division by the extremely low seasonal index percentage will raise 
the deseasonalized data to a very sharp peak. This is true to a lesser 
degree when the series is characterized by a single sharp seasonal peak 
each year. 

A simple expedient is as follows. Compute a seasonal index by what- 
ever method seems appropriate. The index is now converted into terms 
of the original data by multiplying the seasonal index numbers (expressed 
as percentage deviations) each year by the average value of the original 
series for that year. Seasonal is then eliminated by subtracting, algebraic- 
ally, the seasonal index from the original data. 

It may be desirable to compute the index number, in the first instance, 
in such a way as to obtain a seasonal index in absolute rather than relative 
terms. This will be so if the seasonal movements each year seem to be 
similar in absolute magnitude rather than in percentage deviations. In- 
spection of a chart of the data may indicate whether this is true. If the 
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evidence indicates that an index of absolute deviations should be com- 
puted, it is necessary only to adapt one of the methods with which the 



THOU5AN0S 
OF CARS 



Chart 196, United States Passenger Car Production as Beseasonalized (A) by Divi- 
sion Method and (B) by Subtraction Method, April 1930-March 1937* (For source oi 
oriidnal data see Chart 192.) 
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reader is familiar# For instance, if the moving average method is used, 
the moving average is subtracted from, instead of divided into, the original 
data; and the index from that point is constructed as usual, the final index 
being adjusted to total zero by the subtraction of a correction factor. In- 
cidentally, it might be noted that any of the devices explained earlier in 
this chapter may be based on the subtraction method of computing sea- 
sonal. The link relative method (described in the preceding chapter) can 
also be adapted very easily as follows: (1) Obtain link differences by sub- 
tracting the preceding month from each month; (2) average these link 
differences, month by month; (3) let the first month link difference be 
zero, and chain the links by successive addition; (4) correct chain differ- 
ences for trend by successive subtraction of correction factor; (5) adjust 
chain differences to total zero by subtraction of a constant correction 
factor. 

It may be apparent, from inspection of the plotted time series, whether 
to compute a seasonal based on percentages or on differences, or whether 
to adapt the ordinary seasonal index so that the correction may be made 
by subtraction. However, it may be necessary to try several approaches 
until a satisfactory method has been found. The two parts of Chart 196 
indicate the difference between the division method and the subtraction 
method of correcting passenger car production for seasonal. Two seasonal 
indexes were used in each section of the chart, the second index beginning 
with April 1935. 

Logical basis of methods of construction. With the exception of the 
adjustment for Easter, the methods described in this chapter are more or 
less empirical in nature, depending for their validity upon the results which 
they produce, A method is held to be satisfactory if the deseasonalized 
data (1) do not show similarity of intra-year pattern (other than cyclical) 
in different years; (2) are not extremely irregular in their movements; and 
(3) are of about the same magnitude as the original data in 12-inonth 
periods. 

The Easter adjustment, on the other hand, attempted to find a func- 
tional relationship between April sales minus March sales and the date of 
Easter. Carrying this idea further, -it might be possible to find a numerical 
relationship over time between length of daylight and sales of incandescent 
lamps; or between temperature and sales of ice; or between a combination 
of temperature and snowfall and sales of galoshes. Computation of sea- 
sonal indexes by such a method would carry us far into the field of corre- 
lation, which is treated in the last four chapters of this book. Further- 
'baore, it would be diflficult to measure the importance, let us say, of Christ- 
mas by correlating sales with some other factor. 

Intermediate between these two types of methods is that which obtains 
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a first approximation seasonal index by an empirical method; and then 
seeks to smooth this index by fitting a curve to the seasonal index numbers 
on the theory that the seasonal movement would present a smooth pattern 
if the period covered were long enough to permit an exact cancelling out 
of all irregular movements. Freehand smoothing of the seasonal curve is 
practiced by a few statisticians. The fitting of a mathematical curve is 
not usually advocated. Indeed, it would be easy to find logical objections 
to a simple curve fitted to most data. Usually there are social factors 
that disturb the smoothness of contour inherent in a simple mathematical 
curve. 


Weekly Seasonal 

There has been a tendency in recent years to attempt to furnish im- 
portant economic data promptly and at frequent intervals, so that users 
can be familiar with the current situation rather than with that of a 
month or two ago. Naturally, weekly or daily series have rather wide 
irregular variations, but the calendar variations and seasonal swings also 
require careful attention. It is especially important to adjust such data 
for holidays, since one day of idleness in a week of six working days is a 
difference of 16.7 per cent. The steel operations data used in the follow- 
ing illustration, however, do not require such adjustment since they are 
expressed as percentages of capacity. 

There is little that is new in the computation of a weekly seasonal. It 
is s imi lar to the moving average method discussed in Chapter XVII. As 
with that method, an attempt is made to estimate values which consist of 
combined trend and cycle, and the original data are then expressed as 
percentages of these values. From these percentages the seasonal index 
is computed. 

Trend X Cycle estimates for each week of each year are not obtained 
by means of a 52-week moving average (corresponding to the 12-month 
moving average used in computing a monthly seasonal). Since the num- 
ber of days in a year is not a multiple of seven, the 52 weeks in a year 
end (or center) on different dates in different years. Thus the first week 
ending in a given year may end on January 1, 2, 3, 4, 5, 6, or 7. Because 
a given week may end on any one of the seven dates, there will be in 
general one-seventh as many observations for a week ending on"a given 
date as there are years under consideration. Consequently, if percentages 
of a 52-week moving average are arrayed by dates in the year and averages 
taken to eli m i n ate irregular and extraneous cyclical movements (in accord- 
ance with the method used in computing a monthly seasonal index), the 
observations for any week will ordinarily be too sparse to obtain a typical 
value for that week. Therefore a more accurate estimate of Trend X 



Chap. 18] 


TYPES OF SEASONAL MOVEMENTS 


529 


Cycle must be obtained — one which reaches into the cyclical peaks and 
troughs more faithfully. The method of making this estimate will be ex- 
plained in the following paragraphs, as will the other steps involved in 
computing the index. ^ 

Another difference between a monthly seasonal and a weekly seasonal 
should be noticed. Although a monthly seasonal index requires only 
twelve index numbers, one for each of the twelve months, a weekly sea- 
sonal index requires not merely one number for each of the 52 weeks, but 
365 numbers (366 for leap years), which is one for every week ending on 
each possible date. 

1. Obtain monthly data. If comparable monthly and weekly data are 
not available, monthly data may be obtained by taking monthly averages 
of the weekly data, as in column 3 of Table 125. 

If the data are for weeks ending on specified dates, the results should 
be placed opposite the week containing the 15th of the month; if the dates 
specified are for the center of the week, the results should be placed opposite 
the date nearest the 15th.® The former is the procedure followed in 
Table 125, column 3. (In this table the years 1927 and 1936 only are 
shown.) 

2. Compute a monthly seasonal index. The seasonal index is shown in 
column 4 of Table 125. 

3. Adjust monthly data for seasonal movement by dividing them by the 
seasonal index. See starred items of column 5, Table 125. 

4. Obtain approximations of weekly values of Trend X Cycle by arithmetic 
interpolation of the adjusted monthly values. See column 5. This method 
is not perfect, since the deseasonalized data also contain irregular move- 
ments and the interpolation process does not entirely eliminate them. 

5. Express the original weekly data as percentages of these estimates. 
These percentages are supposed to contain only seasonal movement and 
irregular variations. 

6. Tabulate these percentages according to day of month, as in Table 126, 
column 2. The items marked A are from column 6 of Table 125 and 


* See “A Method of Calculating Weekly Seasonal Indexes/^ by Leroy M.' Piser, 
Journal of the American StaiisUcal Association, VoL XXVII, September 1932, pp 307- 
309. 

® Maximum accuracy is obtained for monthly averages of weekly data if the data 
are taken for weeks centering on specified dates, and fractional weeks at the beginning 
and end of each month are given fractional weights The labor of such a procedure is 
probably not justified by the added accuracy obtained According to the method 
followed in Table 125, the monthly averages do not result in estimates for calendar 
months, but for periods beginning and ending a few days before the end of the month. 
This does not impart a bias to the weekly index, however, since the weekly data are 
for weeks ending, rather than centering, on specified dates in the month. 



TABLE 125 


Computation op Percentages of Estimated Trend X Ctcle Movements op R^te 
OP Steel Operations op Entirb United States Industry, 1927-1936 


Year, month 
and day 
(week endmg 
Monday.) 

(1) 

Per cent of 
capacity 

(2) 

Monthly 
average 
centered 
on week 
including 
15th 
(3) 

1927 



January 3 

75 0 


10 

76 5 


17 

76.5 

76 5 

24 

76 5 


31 

78.0 


February 7 

79 0 


14 

81.0 


21 

83 5 

82.6 

28 

87 0 


March 7 

87.5 


14 

91 5 


21 

92 5 

90 9 

28 

92 0 


April 4 

90.0 


11 

88.5 


18 

86 5 

87.3 

25 

84.0 


May 2 

82 0 


9 

810 


16 

80 0 

80.9 

23 

815 


30 

80.0 


June 7 

75.5 


14 

74 0 


21 

710 

72 9 

28 

710 


July 4 

I 67.5 


11 

66.5 


18 

67 0 

67.4 

25 

68.5 


August 1 

68.5 


8 

65 5 


15 

66.0 

66,8 

22 

66.0 


29 

68 0 



Monthly 

Estimated 

Per cent of 

seasonal 

Trend X 

Trend X Cycle 

mdex 

Cycle 

[Col 2 Col. 5] 

(4) 

(6) 

(6) 

96 6 

79 2* 

96 6 


78.7 

97 2 


781 

99 9 


77 6 

1018 


77 0 

105.2 

108.0 

76 5* 

109 2 


77 8 

111 8 


791 

110.6 


80 4 

113 8 

111.2 

817* 

113 2 


80 6 

114.1 


79.5 

113 2 


78 4 

112.9 

113.0 

77 3* 

111.9 


76.2 

110 2 


75 0 

109.S 


73 9 

109 6 

lil.l 

72 8* 

109 9 


72 7 

112.1 


72 6 

110 2 


72 4 

104 3 


72.3 

102.4 

101 0 

72 2* 

98.3 


72 0 

98.6 


71.7 

94 1 


714 

93 1 

94.6 

712 

94 1 


71 2* 

96 2 


71.3 

96.1 


71.4 

917 

93.5 

71.4* 

92.4 


71.1 

92 8 


70.8 

96 0 


530 



TABLE 125 (Continued) 

Computation of Percentages of Estimated Trend X Cycle Movements of Rate 
OF Steel Operations of Entire United States Industry, 1927“*1936 


Year, month 
and day 
(week ending 
Monday:) 

(1) i 

Per cent tT 
capacity 

(2) 

Monthly 
average 
centered 
on week 
including 
15th 
(3) 

Monthly 

seasonal 

index 

(4) 

Estimated 
Trend X 
Cycle 

(5) 

Per cent of 
Trend X Cycle 
[Col 2 Col 5] 

(6) 

SepteiAber 5 

67 5 



70 6 

95 6 

12 

65 0 



70.3 

92.5 

19 

62 0 

64 6 

92 3 

70 0* 

88 6 

26 

64 0 



69 7 

918 

October 3 

65 0 



69 4 

93 7 

10 

66 0 



69 0 

95.7 

17 

64 0 

65.1 

94 8 

68 7* 

93 2 

24 

65 0 



69 4 

93 7 

31 

65 5 



701 

93 4 

November 7 

66 0 



70.8 

93 2 

14 

67 0 



715 

93.7 

21 

68 5 

66.9 

92.6 

1 72 2* 

94 9 

28 

66 0 



72 1 

91 5 

December 5 

610 



72 0 

84.7 

12 

63 5 



718 

88 4 

19 

67 5 

65 5 

914 

71 7* 

94.1 

26 

70.0 



73.1 

95 8 


1936 






January 6 

48 0 1 



55.1 

8'7.1 

13 

510 



53 6 

95 1 

20 

510 

50.3 

96.6 

52.1* 

97.9 

27 

510 



51.2 

99.6 

February 3 

50 5 



50 3 

100.4 

10 

52.0 



49 4 

105.3 

17 

53.0 

52.4 

108.0 

48.5* 

109 3 

24 

54.0 



48.9 

110.4 

March 2 

55.0 



49.3 

111.6 

9 

56.0 



49.7 

112.7 

16 

58 0 

55.7 

111.2 

50.1* 

115 8 

23 

50 5 



52.0 

97.1 

30 

59.0 



53 9 

109.5 

April 6 

63.0 



55 8 

112.9 

13 

66 0 



57 7 

314.4 

20 

70.0 

67.4 

113.0 

59 6* 

117.4 

27 

70.5 



60.2 

117.1 


531 



TABLE 125 (Continued) 


Computation of Percentages of Estimated Trend X Cycle Movements of Rate 
OF Steel Operations of Entire United States Industry, 1927-1936 


Year, month 
and day 
(week ending 
Monday.) 

(1) 

Per cent of 
capacity 

(2) 

Monthly 
average 
centered 
on week 
including 
15th 
(3) 

Monthly 

seasonal 

index 

(4) 

Estimated 
Trend X 
Cycle 

(5) 

Per cent of 
Trend X Cycle 
[Col 2 - Col 5] 

(6) 

May 4 

1 70 0 



60 9 

114 9 

11 

! 69 0 



61 6 

112 0 

18 

1 69 0 

69 1 

111 1 

62.2* 

110 9 

25 

68 5 



64 0 

107.0 

June 1 

i 68 5 



65 9 

103.9 

8 

69 5 



67 7 

103.7 

15 

70 5 

70 2 

1010 

69 5* 

1014 

22 

71 5 



701 

102 0 

29 

71 5 



70 7 

i 101 1 

July 6 

65 5 



713 

91 9 

13 

1 67 0 



71.9 

93 2 

20 

70 0 

68 6 

94 6 

72 5* 

96 6 

27 

72 0 



73 6 

97 8 

August 3 

72 0 



74.6 

96.5 

10 

715 



75.7 

94.5 

17 

70 5 

71.8 

93.5 

76.8* 

91.8 

24 

72.5 



77 0 

94 2 

31 

72.5 



77.3 

93.8 

September 7 

69 0 



77 5 

89 0 

14 

710 



77.8 

91 3 

21 

73 5 

72 0 

92 3 

78 0* 

94 2 

28 

74.5 



78.3 

95 1 

October 5 

75.5 



78.6 

96 1 

12 

75.5 


i 

78.8 

95 8 

19 

75 0 

75.0 

94.8 

79.1* 

94 8 

26 

74 0 



79 3 

93 3 

November 2 

74 0 



79.6 

93 0 

9 

74 5 



79 8 

93.4 

16 

74 5 

74 5 1 

92 6 

80 5* 

92.5 

23 

74.5 



811 

919 

30 

75.0 



817 

91.8 

December 7 

77.0 



82 2 

93.7 

14 ! 

80.0 



82.8 

96.6 

21 

81.0 

76 2 

91.4 

83.4* 

97.1 

28 

68 0 






* These ltem« are Col 3 -i- Col 4. Other items in column, are obtained by inteTpolation 
Source: Revised data furnished to wnters by Standard Statistics Co* 
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TABLE 126 (Continued) 

Computation op Weekly Seasonal of Rate of Steel Operations, 1927-1936 


Month 

and 

day 

(1) 

Percentages of 
Trend X Cycle 

(2) 

Averages 
and inter- 
polations 
(3) 

Moving 

modified 

meant 

(4) 

Smoothed 

moving 

mean 

(5) 

Weekly 

seasonal 

[Col 5 X 1 00221]# 
(6) 

February 







10 

109 8 

105 3A 

107 6 

108 2 

108 0 

108 2 ^ 

11 

105 3 

110.7 

108 0 

108.2 

108 0 

108 2 

12 

107 0 


107 0 

107 5 

108 0 

108 2 

13 

109 3 

110.2 

109 8 

108.0 

108 0 

108 2 

14 

106.2A 


105 2 

108 5 

108 5 

108 7 

15 

109.1 


109.1 

109 5 

109 0 

109 2 

16 

109 5 


109 5 

108 9 

109 5 

109 7 

17 

110 4 

109 3A 

109 8 

109 5 

109 5 

109 7 * 

18 

108 9 

107 0 

108.0 

109 7 

109 5 

109 7 

19 

111 1 


111.1 

109 6 

109.5 

109 7 

20 

> 107 8 

111.7 

109.8 

109 0 

109 5 

109 7 

21 

109 2A 


109 2 

109.9 

109 5 

109 7 

22 

104 2 


104.2 

109 9 

109 5 

109.7 

23 

110 6 


110 6 

109 1 

109 5 

109.7 

24 

111 1 

110 4 A 

110 8 

109.6 

109.5 

109 7 * 

25 

109 0 

105 9 

107.4 

110 2 

110 0 

110 2 

26 

115 4 


115.4 

110.6 

110 5 

110 7 

27 

108 2 

110 1 

109 2 

110 2 

110 5 

110 7 

28 

1118A 


1118 

110.6 

110 5 

110 7 

December 






1 

i 

91 1 


91.1 

916 

91 5 

i 91.7 

2 

90 3 

96.7 

93 5 

913 

91.5 

! 91 7 

3 

94.0 

86.1 

S 90 0 

90.3 

90 5 

90.7 

4 

89 7 


89.7 

904 

90 5 

90.7 

5 

84 7A 

95.0 

89 8 

90 4 

90 5 

90.7 

6 



915 

90.7 

90 5 

90.7 

7 

92 7 

93 7A 

93.2 

917 

910 

91.2* 

8 

90 9 


90.9 

1 917 

915 

917 

9 

88.9 

96 6 

92 8 

1 92.0 

91.5 

91.7 

10 

97.2 

88 7 

90.0 

' 91.1 

915 

917 

11 

92 3 


92 3 

i 91.3 

915 

91 7 

12 

88.4A 

90.6 

89.5 

i 91 3 

92 0 

92 2 

13 



916 

1 92 6 

92 5 

92.7 

14 

90.9 

96.6A 

93.8 

92 6 

92 5 

92.7* 

15 

95.6 


95.6 

92 6 

92 5 

92.7 

16 

90.8 

94,0 

92 4 

93 9 

93 0 

93 2 

17 

88.9 

91.2 

90.0 

93 2 

93 0 

93 2 

IS 

97 3 


97.3 

92 3 

93 0 

93.2 

19 

94.1A 

89.0 

91.6 

92.9 

93.0 

93.2 
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TABLE 126 (Continued) 

Computation op Weekly Seasonal op Rate op Steel Operations, 1927-1936 


Month 

and 

day 

(1) 

Percentages of 
Trend X Cycle 

(2) 

Averages 
and inter- 
polations 
(3) 

Moving 

modified 

meant 

(4) 

Smoothed 

moving 

mean 

(5) 

Weekly 

seasonal 

[Col. 5 X 1 00221]# 
(6) 

20 



92.9 

92.9 

93 0 

93.2 

21 

91.3 

97 lA 

94.2 

92 0 

92 5 

92 7-^ 

22 

85 9 


85 9 

92 2 

92 0 

92 2 

23 

90 3 

92 9 

91 6 

92.6 

91.5 

91.7 

24 

90.7 

93 2 

92 0 

89.9 

91 0 

91.2 

25 

97.6 


97 6 

89 9 

90 0 

90 2 

26 

95 8A 

76 2 

86.0 

86 6 

86 5 

86.7 

27 



819 

819 

82 0 

82.2 

28 

77 8 


77 8 

819 

82 0 

82.2=" 

29 

741 


74 1 

82 5 

82 5 

82 7 

30 

89 1 

86 6 

87 8 

85 0 

85 0 

85 2 

31 

94 7 

92 6 

93 6 

87 6 

87.5 

87 7 

Total 




36,419 5 

36,493 8$ 


A From column 6, Table 125. 

* These index numbers are for vr'-r’c- in 1936 See also Table 127 

t Mean of middle three of five ■ • ( 

# Correction factor: 1 00221 = 36,500 0 - 36,419 5 

t Discrepancy between 36,500 0 and 36,493 8 is due to rounding. It represents about .02 of one per cent. 
Source: See Table 125. 


refer to the years 1927 and 1936. Table 125 would include the other 
values, also, if all intervening years were shown. (In Table 126, January, 
February, and December only are shown.) It should be noticed that in 
some instances there will be more than one value for a particular day; 
in other instances there will be none. 

7. Obtain one value for each day of each month. This is done in column 3 
of Table 126 by averaging when there are two or more values, and inter-- 
pointing when there are none. 

8. Smooth these values by a moving average. In column 4 of Table 126 
a modified mean — ^the middle three of five arrayed items — was used. IJsing 
a small number of items makes for flexibility, while the modified mean 
makes for smoothness by toning down the influence of extremes, which 
are likely to be great with weekly data. The type of moving average to 
use, however, must be decided separately for each series. Chart 197 
shows by dots the percentages of column 2 of Table 126. The thin broken 
line represents the moving modified mean. 

9. Further smooth the values by inspection. This is shown by the heavy- 
solid line of Chart 197. The logic of this double smoothing process is 
that, since there are only one or two items for any day, irregularities cannot 





TABLE 127 

Seasonal Adjustment of Weekly Movements of Rate of STEBii Operations in 

THE United States, 1936 


Week 

ended 

Monday; 

(1) 

Per cent 
of 

capacity 

(2) 

Weekly 

seasonal* 

(3) 

Deseason- 

ahzed 

data 

[Col 2 -i- 
Col 31 
(4) 

Week 

ended 

Monday: 

(1) 

Per cent 
of 

capacity 

(2) 

Weekly 

seasonal* 

(3) 

Deseason- 
alized 
data 
[Col 2 -r 
Col 3] 
(4) 

January 




July 




6 

48.0 

88 7 

541 

6 

65.5 

91.7 

71.4 

13 

510 

95 2 

53 6 

13 

67 0 

92.2 

72.7 

20 

51.0 

99.7 

51.2 

20 

70 0 

95.7 

73.1 

27 

51 0 

103.2 

49.4 

27 

72.0 

97.2 

74 1 

February 




August 




3 

50.5 

105.2 

48 0 

3 

72 0 

95.2 

75.6 

10 

52.0 

108.2 

481 

10 

71.5 

94.2 

75 9 

17 

53.0 

109.7 

48.3 

17 

70 5 

94.2 

74 8 

24 

54.0 

109 7 

49.2 

24 

72 5 

93.7 

77.4 





31 

72.5 

91.7 

79.1 

Marcli 




September 




2 

55 0 

110.7 

49.7 

7 

69 0 

88.7 

•77.8 

9 

56.0 

111.7 

50.1 

14 

710 

92.7 

76 6 

16 

58.0 

112.2 

51.7 

21 

73 5 

93.2 

78.9 

23 

50 5 

109.2 

46.2 

28 

74.5 

95.7 

77.8 

30 

! 59.0 

1 

111.7 

52.8 





April 

j 



October 

1 

i 



6 

63.0 

110,7 

56.9 

5 

75.5 

95.7 

78 9 

13 

66.0 

' 112.2 

58 8 

12 

75.5 

96.2 

78.5 

20 

70 0 

! 114.3 

61.2 

19 

i 75.0 

94 7 

79.2 

27 

; 70.5 

114.8 

61.4 

26 

74.0 

93.7 

79.0 

May 




November 




4 

70.0 

112.7 

62.1 

2 

1 74.0 

94.2 

78.6 

11 

69 0 

112.2 

61.5 

9 

74.5 

94.2 

79.1 

18 

69 0 

110.7 

62.3 

16 

74.5 

93 2 

79.9 

25 

68.5 

109.7 

1 62.4 

23 

74.5 

93.2 

79.9 





30 

75.0 

91.7 

81.8 

June 

» 



December 




1 

68.5 

106.7 

64.2 

7 

770 

91.2 

84.4 

8 

69.5 

104.7 

i 66.4 

14 

80.0 

I 92.7 

86.3 

15 

70.5 

101.7 

69.3 

21 

81.0 

92.7 

87.4 

22 

71.5 

1 99.2 

72.1 

28 

68.0 

i 82.2 

82.7 

29 

71.5 

98 2 

72,8 






* These are the starred items of Table 126 column 6. 
Source: Tables 126 and 126. 
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completely cancel out, even, by the use of a moving modified mean, such 
as is used in step 8. 

10. Adjust the values to total 36,500. This must be done because the 
seasonal index should average 100, and there are 365 days in a year (ex- 
cept leap year). The method is to divide 36,500 by the total of the values 
of column 5 (Table 126) and multiply each value by the quotient. We 
now have a seasonal index number for a week ending on every day of 
the year. Chart 197 does not show a separate line for the final weekly 
seasonal, since it would be indistinguishable from the solid line. 

Chart 142 affords a comparison between the monthly and weekly sea- 



Chart 198. Rate of Steel Operations in the United States Before and After Adjustment 
for Seasonal Variation, by Weeks, 1936. (Data of Table 127 ) 

sonal indexes* of these data. It is apparent that the monthly index is not 
sufficiently flexible to be used with weekly data. 

The deseasonalizing of weekly data presents no new problem. The 
process is illustrated by Table 127, for the year 1936. The seasonal index 
numbers of column 3 (Table 127) are taken from column 6 of Table 126. 
The original data are now divided by these numbers and the results re- 
corded in Table 127, column 4. They are shown graphically in Chart 198. 
Note that the larger irregularities are reduced somewhat, and the improve- 
ment in steel operations during the year is more marked than is apparent 
from the original data. 
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CHAPTER XIX 

CYCLICAL MOVEMENTS 


Five chapters on time series analysis have been presented for the reader^s 
study. Chapter XIV explained that economic time series were typically 
the product of secular trend (T), cyclical movements (C), seasonal varia- 
tions (S), and irregular fluctuations (J). Chapters XV and XVI were de- 
voted to a consideration of types of trends, how to select the appropriate 
type, methods of trend fitting, and how to remove statistically from the 
data the effect of trend. Chapters XVII and XVIII, similarly, considered 
fcypes of seasonal variations, their measurement and elimination. In this 
chapter we shall consider methods of measuring the time series movement 
that is probably of most interest to economists: cyclical variation. 

There are several methods which are sometimes used, and we shall con- 
sider them in the following order: (1) residual method; (2) direct method; 
(3) harmonic analysis; (4) method of cyclical averages. 

Residual Method 

This is the orthodox method, and the one most commonly used. It 
consists in successively eliminating seasonal and trend from the data, thus 
obtaining cyclical-irregular movements, and then perhaps further smooth- 
ing the results to obtain the cyclical movements, or cyclical relatives^ as 
they are sometimes designated. It will be recalled that in Chapter XVII, 
Table 116 illustrated the elimination of seasonal by division of original 
data by the seasonal index. Consequently this process need not be re- 
peated here. Since the seasonal index numbers are percentages averaging 
100 per cent each year, the deseasonalized data are in terms of original 
units and of approximately the same magnitude. Cyclical-irregular move- 
ments are now obtained by dividing the seasonally adjusted data month 
by month by the trend values, obtaining percentages, as in column 4 of 
Table 128. Trend values are obtained by the process described on pages 
395-~411, using the trend equation shown at the top of page 409. The 
cyclical movements of column 6 are the result of smoothing the data of 
column 4 by a moving average, as will shortly be explained. 
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TABLE 128 

COMPITTATION OF CYCLICAL MOVEMENTS FROM DeSBASONALIZBD UnITBD StATES 

Magazine Advertising Data, 1921-1937 


Year and month 

(1) 

Deseasonalized 

data 

(2) 

Trend values 

(3) 

Cyclical-irregular 
movements 
(per cent) 

[Col 2 Col 3] 

(4) 

Cyclical relativeu 
(per cent) 
S-month binomial 
moving average 
of column 4 
(5) 

1921 





January 

2,334 

2,166 

107 8 


February 

2,038 

2,176 

93 7 

95 3 

March 

1,881 

2,186 

86 0 

86 6 

April 

1,776 

2,196 

80.9 

83.4 

May 

1,890 

2,206 

85 7 

84 3 

June 

i;884 

2,216 

85 0 

85 6 

July 

1,928 

2,226 

86 6 

86 3 

August 

1,947 

2,236 

87 1 

85 0 

September 

i;776 

2,246 

79 1 

79 4 

October 

i;630 

2,255 

72 3 

74 3 

November 

i;668 

2,265 

73 6 

73 6 

December . . . . 

i;700 

2,275 

74 7 

76.8 

1922 





January 

1,925 

2,285 

84,2 

80 6 

February 

1,819 

2,295 

79 3 

80 2 

March 

1,803 

2,305 

78 2 

78 8 

April 

i;837 

2,315 

79 4 

80 2 

May 

1,952 

2,325 

84 0 

83 2 

June 

1,994 

2,335 

85 4 

86 0 

July 

2,089 

2,345 

891 

89 0 

August . . . . 

. 2,175 

2,354 

92 4 

91.0 

September 

2,127 

2,364 

90 0 

914 

October . 

2,207 

2,374 

93 0 

91.6 

November 

2,161 

2,384 

90 6 

92.8 

December 

2,318 

2,394 

96.8 

96.7 

1923 





January 

2,468 

2,404 

102 7 

100.1 

February 

2,367 

2,414 

981 

99.5 

March 

2,399 

2,424 

99 0 

99.8 

April 

May 

2,507 

2,434 

103 0 

101.9 

2,511 

2,444 

2,454 

102 7 

103.2 

June 

2,565 

104.5 

104 7 

July 

2,635 

2,463 

107 0 

i 105 8 

August 

2,589 

2,473 

1 104.7 

f 104.2 

September 

2,497 

2,483 

100 6 

102 2 

October 

2,567 

2,493 

103,0 

102.0 

November 

2,542 

2,503 

101.6 

102.5 

December 

2,609 

2,513 

103.8 

104.0 

1924 ' 





January 

2,690 

2,523 

106.6 

105.2 

February 

2,631 

2,533 

103.9 

105.5 

March 

2,735 

2,543 

107 6 

106 9 

April 

May 

2,769 

2,553 

108.5 

106.8 

2,633 

2,562 

102.8 

105.3 

June 

2,758 

2,572 

107.2 

103.2 

July 

2,472 

2,526 

2,582 

2,592 

95.7 

99.0 

August 

97.5 

96.3 

September 

2,458 

2,602 

94.5 

95.4 

October 

2,483 

2,612 

95.1 

95.4 

November 

2,542 

2,622 

969 

97.6 

December 

2,673 

2,632 

101.6 

98.6 


541 



TABLE 128 (Continued) 

Computation of Cyclical Movements fkom Deseasonalized United States 
Magazine Advertising Data, 1921-1937 


Year and month 

Deseasonalized 

data 

Trend values 

Cyclical-irregular 
movements 
(per cent) 
tCol 2 Col 3] 

(1) 

(2) 

(3) 

(4) 


Cyclical relatives 
(per cent) 
3-month binomial 
moving average 
of column 4 
(5) 
















TABLE 128 (Continued) 

Computation op Cyclical Movements fbom Deseasonalized United States 
Magazine Advertising Data, 1921-1937 


Year and month 

Cl) 

Deseasonalized 

dat? 

(2) 

Trend values 

(3) 

Cyclical-irregular 
movements 
(per cent) 

[Col 2 Col 31 

(4) 

Cyclical relatives 
(per cent) 
3-nionth binomial 
moving average 
of column 4 
(5) 

1929 





January 

3,165 

3,117 

101.5 

100.7 

February 

3,249 

3,127 

103.9 

104.2 

March . . 

3,378 

3,137 

107 7 

107.2 

April . . . 

3;453 

3,147 

109 7 

108 8 

May 

3,414 

3,157 

1081 

108 8 

June 

3,457 

3,167 

109.2 

109.2 

July 

3;510 

3,177 

110 5 

109 0 

August . . . 

3,375 

3,186 

105 9 

107.7 

September . . . . 

3,467 

3,196 

108 5 

106.9 

October. . . . 

3,360 

3,206 

104.8 

105.6 

November 

3;355 

3,216 

3,226 

104.3 

104.7 

December .... 

3,401 

105.4 

104.6 

1930 





January . 

3,336 

3,236 

1031 

102.1 

February 

3,143 

3,246 

96 8 

98.6 

March . 

3,178 

3,256 

97 6 

97.0 

April 

3,137 

3,266 

96.1 

95.2 

May, . . . . 

2,983 

3,276 

911 

92.6 

June .... 

3,019 

3,286 

919 

91.0 

July 

2,942 

3,295 

89 3 

89.3 

August 

2,869 

3,305 

86.8 

88.1 

September . 

2,962 

3,315 

89.4 

87.4 

October . ... 

2,800 

3,325 

84 2 

85.1 

November ... 

2,753 

3,335 

82.5 

82.7 

December 

2,730 

3,345 

81.6 

81.3 

1931 





January 

2,664 

3,355 

794 

79.7 

February 

2,639 

3,365 

78.4 

78.1 

March 

2,569 

3,375 

761 

75.7 

April 

May 

2,448 

3,385 

72 3 

73.1 

2,435 

3,394 

71.7 

72.0 

June 

2,459 

3,404 

72.2 

71.6 

July 

2,399 

3,414 

70 3 

70.6 

August 

2,389 

3,424 

698 

69,6 

September 

2,359 

3,434 

68 7 

68.5 

October 

2,298 

3,444 

66 7 

66.5 

November 

2,212 

3,454 

640 

63.8 

December 

2,101 

3,464 

60.7 

61.4 

1932 





January 

2,091 

3,474 

60.2 

60.2 

February 

2,079 

3,484 

59.7 

59.4 

March 

2,032 

3,494 

58.2 

57.6 

April 

May 

1,900 

3,503 

54.2 

54.9 

1,867 

3,513 

58.1 

52.2 

June 

1,713 

3,523 

48 6 

49.4 

July 

1,673 

3,533 

1 47 4 

47.4 

August 

1,636 

3,543 

1 46 2 

45.4 

September 

1,494 

3,553 

1 420 

43,0 

October 

1,489 

3,563 

41.8 

42.5 

November 

1,587 

3,573 

444 

43.7 

December 

1,589 

3,583 

443 

43.6 


543 




TABLE 12S (^Continued) 

Computation op Cyclical Movements from Deseasonalizbd United States 
Magazine Advertising Data, 1921-1937 


Year and month 

(1) 

Deseasonalized 

data 

(2) 

Trend values 

(3) 

Cyclical-irregular 
movements 
(per cent) 

[Col 2 -7- Col 3] 

(4) 

Cyclical 7i*lafcives 
(per cent) 
3-month binomial 
moving average 
of column 4 

(5) 

1933 





January . . 

1,486 

3,593 

41.4 

42 5 

February .... 

1,549 

3,602 


42.4 

March 

1,516 

3,612 

42 0 

41 4 

April 

1,399 

3,622 

38 6 

39.6 

May 

1,420 


391 

38.8 

June 

1,390 

3,642 

38 2 

39.3 

July 

1,527 


41.8 

41.7 

August 

i;651 

3,662 

45.1 

43.9 

September . . 

1,604 

3,672 

43.7 

449 

October , . . 

1,733 

3,682 

471 

46.1 

November. . . . 

1,719 ‘ 

3,692 

46.6 

46.8 

December . . . 

1,734 

3,702 

46.8 

47.4 

1934 





January 

1,831 

3,711 

49.3 

48 7 

February 

1,835 

3,721 

49 3 

49.5 

March 

1,873 

3,731 

50.2 

50 8 

April 

May 

1,998 

3,741 

53.4 

52 9 

2,050 

3,751 

54.7 

54.3 

June 

2,044 

3,761 

543 

55.6 

July 

2,224 

3,771 

59.0 

57.2 

August 

2,139 

3,781 

56.6 

56.8 

September 

2,083 

3,791 

54.9 

55.4 

October 

2,098 

3,801 

55.2 

55.1 

November 

2,097 


55.0 

54.8 

December 

2,068 


54.1 

54.6 

1935 





January 

2,105 



54.6 

February 

2,094 

3,840 

54.5 

54.8 

March 

2,117 



55.3 

April 

May 

2,184 


56.6 

55.9 

2,146 


55 5 

55.4 

June. 

2,102 


54.2 

55 1 

July 

2,198 


56 5 

55 2 

August 

2,088 

3,900 

53.5 

54.1 

September 

2,066 

3,910 

52 8 

52.7 

October 

2,021 

3,919 

51.6 

51.7 

November 

1,992 

3,929 


52.6 

December 

2,259 

3,939 

57.3 

55.6 

1936 





January 

2,258 

3,949 

57.2 

56.9 

February 

2,212 

3,959 

65.9 

57.0 

March 1 

2,336 

3,969 

58.9 

58.0 

April 

May 

2,314 

3,979 

• 58 2 

58.5 

2,338 

3,989 

58.6 

58 7 

Jime 

2,374 


59.4 

59.1 

July 

2,361 


58.9 


August 

2,364 

4,018 

58.8 

58.9 

September 

2,376 

4,028 


59.3 

October 

2,444 


60.5 


November 

2,476 


61.2 


December ^ 

2,644 . 


65.2 

64.5 


544 
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TABLE 128 (Continued) 

Computation op Cyclical Movements from Desbasonalized tr?nTBD States 
Magazine Advertising Data, 1921-1937 


Year and month 

(1) 

Deseasonalized 

data 

(2) 

Trend values 

(3) 

Cych cal-irregular 
movements 
(per cent) 

[Col 2 - Col 3] 

(4) 

Cyclical relatives 
(per cent) 
3-month binomial 
moMng avciige 
of coliimr 4 
(5) 

1937 

January 

2,704 

4,068 

4,078 

66.5 

64 8 

February 

2,494 

61 2 

62 9 

March 

2,569 

4,088 

62.8 

62.5 

April 

May 

2;594 

4,098 

63.3 

63 6 

2,670 

4,108 

65 0 

64.8 

June ... 

2,721 

4,118 

66.1 

65 6 

July 

2,683 

4,127 

65 0 

66 0 

August . ... 

2,815 

4,137 

680 

66 6 

September. . . 

2,717 

4,147 

65 5 

65 6 

October 

2,643 

4,157 

63 6 

64 4 

November. . . 

2,705 

4,167 

649 

65.1 

December 

2,801 

4,176 

67 1 



Source: Deseasonalized data are from Table 116, trend equation is from Table 89 

Remembering our assumption that original data = T X C X 5 X we 
may describe our process algebraically as foEows: 

Deseasonalized data: T . ^ ^ ^ ^ — TXCX,I- 

T X C X I 

Cyclical-irregular movements: ^ — C X I- 

The data before and after removing trend are shown by the solid lines of 
Charts 199 parts A and B. 

It makes no difference whether seasonal is eliminated first, and then 
trend, or whether the order of efimination is reversed. Thus we may 
write: 

Data adjusted for trend: ^ ^ ^ -CXSXl- 

C X S X I 

Cyclical-irregular movements: 5 — C X I. 

o 

Either of these variations of procedure might be called a method of suc- 
cessive elimination, since T and S are successively eliminated, leaving 
C X I B. residual. 

Still a third variation of the residual method is possible. The term 
‘^normaF' is frequently and variously used in economics and statistics. 
Thus, from a long-run point of view it is normal for industry to increase 
steatEly, and from a short-run viewpoint it is normal for business to vary 
with the season of the year. A more comprehensive view is that hotk 



546 


CYCLICAL MOVEMENTS 


[Chap. 19 


movements together are ''normal” Thus, defining normal as T X S, we 
may obtain percentages of normal by dividing the original data by T X S, 
and so obtain cyclical-irregular movements. 

1 . TXCXSXlr.^, r 

Cyclical-irregular movements : T~X^ = 0 X r . 

The three variations of the residual method are illustrated for the yeai 




Chart 199. Deseasonalized Data, Trend, Cyclical-Irregular Movements, and Cycles, 
United States Magazine Advertising, 1921-1937. (Data of Table 128.) 

1936 in the three sections of Table 129. Note that, except for an occa- 
sional minor discrepancy in the last digit (due to rounding), the final re- 
sults are identical for each procedure. Which of the three methods to 
use is a matter of convenience. Probably the first method is the most 




TABLE 129 

Three Alternative Methods of Deriving Cyclical-Irregular Movements in 
United States Magazine Advertising, 1936 

(Original data m thousands of lines) 

Method A 


Year and month 

Original 

data 

TXCXSXI 

Seasonal 
index 
(per cent) 
S 

Deseasonalized 

data 

T XCXI 

[iTXCXSXD~S] 

Trend 

values 

T 

Cyclical-irregular 

percentages 

CXI 

[(r X c X /) - T] 

January 

1,696 

751 

2,258 

3,949 

57 2 

February 

2,128 

96 2 

2,212 

3,959 

55 9 

March 

2,511 

107 5 

2,336 

3,969 

58 9 

April . 

2,860 

123 6 

2,314 

3,979 

58 2 

May. . . j 

2,852 

122 0 

2,338 

3,989 

58 6 

June 

2,637 

111 1 

2,374 

3,999 

59 4 

July . . 

1,967 

83 3 

2,361 

4,009 

58 9 

August 

1,695 

717 

2,364 

4,018 

58 8 

September 

2,084 

87.7 

2,376 

4,028 

59 0 

October 

2,637 

107 9 

2,444 

4,038 

60 5 

November . 

2,736 

110 5 

2,476 

4,048 

61.2 

December 

2,731 

! 103 3 

2,644 

4,058 

65 2 


Method B 


Year and month 

Original 

data 

TXCXSXI 

Trend 

values 

T 

Per cent 1 

of trend 

C xsxi 

t(T X C X 5 X I) T] 

Seasonal 
index 
(per cent) 
S 

Cyclical-irregular 

percentages 

CXI 

[{c xs xr> - s\ 

January 

1,696 

3,949 

42 9 

75 1 

571 

February . 

2,128 

3,959 

53 8 

96 2 

55 9 

March . 

2,511 

3,969 

63.3 

107 5 

58.9 

April 

2,860 

3,979 

71.9 

123 6 

58 2 

May . . . 

2,852 

3,989 

71.5 

122.0 

58.6 

June 

2,637 

3,999 

65 9 

111 1 

59 3 

July ... . 

1,967 

4,009 

491 

83 3 

58 9 

August .... 

1,695 

4,018 

42 2 

71.7 

58.9 

September . 

2,084 

4,028 

51.7 

87.7 

59 0 

October . . . 

2,637 

4,038 

65.3 

107 9 

60.5 

November 

2,736 

4,048 

67 6 

110 5 

61.2 

December , . 

2,731 

4,058 

67.3 

103 3 

65 2 


Method C 


Year and month 

Original 

data 

TXCXSXI 

Trend 

values 

T 

Seasonal 
index 
(per cent) 
S 

“Normal” 

values 

TXS 

Cyclical-irregular 

percentages 

[(T XC XSXI) ^ (TXS)] 

January 

1,696 

3,949 

75.1 

2,966 

57.2 

February 

2,128 

3,959 

96.2 

3,809 

55.9 

March 

2,511 

3,969 

107.5 

4,267 

58.8 

April 

2,860 

3,979 

123 6 

4,918 

58.2 

May 

2,852 

3,989 

122.0 i 

4,867 

58.6 

June 

2,637 

3,999 

111.1 i 

4,443 

59.4 

July 

1,967 

4,009 

83 3 

3,339 

58.9 

August 

1,695 

4,018 

71.7 

2,881 

58.8 

September. . . 

2,084 

4,028 

87.7 

3,533 

59.0 

October 

2,637 

4,038 

107.9 

4,357 

60.5 

November 

2,736 

4,048 

110 5 

4,473 

61.2 

December . . 

2,731 

4,058 

103,3 

4,192 

65.1 


Source: Original data are from Table 108 Other values are computed from those data 
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common, since it may frequently be desired to study separately the sea- 
sonally adjusted data, but only rarely data adjusted for trend alone. On 
the other hand, if a seasonal index is to be obtained by averaging percen- 
tages of trend, it is convenient to eliminate trend at the outset. However, 
if the sole object of the analysis is to obtain cyclical relatives, it will be 
easiest to utilize the third method, which substitutes a multiplication for 
a division. Chart 199B shows by the solid line the cyclical-irregular 
movements which result from any of these procedures. 

Reducing minor irregularities. Although the curve of Chart 199B is 
remarkably regular, there are still a few minor irregularities which appear. 
These are in the main attributable to the interplay of a multitude of forces 
other than those being analyzed. To a slight degree they may be due to 
the fact that our seasonal index is not perfect. There is no entirely satis- 
factory method of eliminating these fluctuations. However, by the use 
of a moving average the curve can be smoothed so as to bring the cyclical 
movements into clearer relief. If the analyst is interested primarily in 
the combined trend and cycle, the seasonally adjusted data rather than 
the cyclical-irregular movements should be smoothed. If it is later desir- 
ed to eliminate trend, the equation may be determined from the smoothed 
data. 

A number of alternatives are open to the analyst in the choice of mov- 
ing averages. Probably the most commonly used is a simple 3-month 
moving average. It frequently happens, on the other hand, that such a 
moving average introduces small inverse fluctuations into the series. This 
difficulty can be overcome and a smoother curve obtained by the use of a 
5-month average. Such an average may, however, be too smooth; it may 
iron out turning points that are significant. In the present study a bi- 
nomially weighted 3-month moving average (central item weighted double) 
has been used in order to attain maximum sensitivity as well as smooth- 
ness. The reader is already familiar with the computation of moving 
averages in general, and binomially weighted moving averages in particular 
(see pages 421-426) ; consequently, the mechanics of computation will not 
be further discussed. Results are shown in column 5 of Table 128 and 
by the dotted line of Chart 199B. 

In general it may be said that the moving average to be chosen depends 
on (1) the irregularity of the data (in respect to both amplitude and dura- 
tion), and (2) the extent to which it is desired to smooth the data. If the 
data are not very irregular, 3 months may be sufficient. The more irreg- 
ular are the data, the larger is the number of months required. But the 
larger the number of months taken, the less flexible is the moving average 
For very irregular data the writers have found that a 5-month average 
weighted 1, 2, 4, 2, 1 gives a smooth, sensitive curve, without excessive 
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labor. Note that the sum of the weights is 10, which eliminates a division. ^ 

Comparison of cyclical movements. One reason for wishing to isolate 
cyclical movements is that they may be compared with cyclical movements 
of other series. Possibly it may be discovered that one series consistently 
precedes the other in its turning points, and thus the cyclical movements 
of the former may be used to forecast those of the latter. Difficulty is 
experienced, however, in visualizing any time relationships between the 
cycles of the series if the amplitudes of their fluctuations differ appreciably. 
Thus, from Chart 200A it appears that pig iron production fluctuates 
about twice as violently as does United States magazine advertising. Be- 
cause of this fact the two curves do not lie close to each other throughout, 
and comparison of their movements is a little difficult. 

A simple remedy for this difficulty is to use for magazine advertising a 
scale about twice as large as that chosen for pig iron production. If a 
considerable degree of accuracy is required, a more satisfactory procedure 
is to adjust the two series so that they will have the same amplitude, and 
use only one scale for the two series. The measure of amplitude usually 
chosen is the standard deviation. 

The customary procedure for making this amplitude adjustment is illus- 
trated in Table 130. First, each series is converted into percentage de- 
viations from normal, by subtracting 100 from each item. Next, the 
standard deviation of each series is computed by the usual formula, 



^ For some purposes it may be desirable not merely to reduce minor irregularities, 
but as nearly as possible to completely eliminate all irregular movements, leaving only 
cycles. For such purposes rather complicated moving averages are occasionally used 
Thus, the cyclical movements of Chart 143A were based upon a moving average weighted 
as foHows: -1, -3, -5, -5, 2, 6, 18, 33, 47, 57, 60, 57, 47, 33, 18, 6, 2, -5, -5, -3, 
—1. A characteristic of this particular weight pattern is that, if it is fitted to a second 
(or third) degree curve of simple polynomial series, it wili fall exactly on that curve. 
Nevertheless, the results are not so smooth as might be desired, nor is it sufficiently 
flexible at the cychcal turning points. (Smoothness is sometimes measured by taking 
the sum of the squares of the third differences. The smaller the sum, the smoother is 
the series.) 

The computation of this moving average is not so difficult as might be expected 
The procedure to be applied to deseasonalized data is as follows: Take a 5-month 
moving total of a 5-month moving total of a 7-inonth moving total. Take a weighted 
7-month moving total of the results with weights as follows: —1, 0, 1, 2, 1, 0, -“1 
Divide by 350 (“5X5X7X2). The reader might at this point refer to pages 
500-501, where a 43-te«in moving average for use with data that have not been de- 
seasonalized was descri oed. That moving average removes seasonal as well as irregular 
movements. Experin entation with the 43-term formula, however, leads the writers 
to the conclusion that it smooths out too much of the amplitude of cycles that are very 
short or have very sharp turning points, and sometimes does not coincide well with 
cyclical turning points. For further discussion, see Frederich E. Macaxilay, The Smooth- 
ing of Time Series^ National Bureau of Economic Eesearch, New York, 1931. 
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Even though the cycles have been put in deviation form, the deviations 
do not cancel out exactly; hence the correction factor in the above formula 
is necessary. Finally each series is expressed in units of its standard 



STANOAJiD 

OCVIATtONS 



Chart 200. Cyclical Movements of United States Magazine Advertising and of Pig 
Iron Production (A) as Percentages and (B) as Deviations in Units of Their Standard 
Deviations, 1921-1937. (Each senes has been smoothed by a binomially weighted 
S-month moving average. For magazine advertising data see Table 130; for source of 
pig iron production, original data, see Chart 139.) 


deviation by dividing by its standard deviation. In the present instance 
we find for United States magazine advertising: 


cr = 


/ 173, 828.54 _ / 3,885.8 \^ 

y 202 \ 202 ) 


22.15. 


TMs compares with a value for <r of 32.25 for pig iron production. There- 
fore, when the cyclical deviations of magazine advertising are divided by 
22.15 and pig iron by 32 25, the variations is the latter will be reduced a 



Chap. 19] 


CYCLICAL MOVEMENTS 


551 


relativeiy large amount. The amplitude of fluctuation of the series will 
be more nearly the same; furthermore, each now has the same degree of 
variability (that is, each has a standard deviation of unity). 

Inspection of Chart 200B, which has a scale running from ~4<r to +30*, 

TABLE 130 

Calctjlation op Cyclical Deviations op United States Magazine Adveettsinq 
IN Units op Standaed Deviations, 1921-1937 


Year and month 

(1) 

Cyclical 
relatives 
(per cent) 

(2) 

Deviations 
from 100 
[Col. 2 - 100] 
(3) 

Squared 

deviations 

(4) 

Deviations 
in terms of cr 
[Col. 3 22 15] 

(5) 

1921: 

January 

February 

95 3 

- 4.7 

22 09 

-0.21 

Marrh 

86 6 

-13,4 

179 56 

- .60 

April 

83.4 

-16 6 

275 56 

- .75 

May 

84.3 

-15 7 

246 49 

- .71 

June . . 

85.6 

-14.4 

207 36 

- .65 

July 

86 3 

-13.7 

187.69 

- .62 

August 

85 0 

-15 0 

225 OO 

- .68 

September , . . . 

79 4 

-20.6 

424.36 

- .93 

October . 

74 3 

-25.7 

660 49 

-1.16 

November . . . 

73.6 

-26.4 

696 96 

-1.19 

December . . . 

76.8 

-23.2 

538 24 

-1.05 

1937: 





January 

64.8 

-35 2 

1,239.04 

-1.59 

February 

62.9 

-37.1 

1,376 41 

-1 67 

March 

62.5 

-37.5 

1,406.25 

-1 69 

April 

63 6 

-36 4 

1,324 96 

-1.64 

May 

64.8 

-35.2 

1,239 04 

-1.59 

June 

65 6 

-34.4 

1,183 36 

-1.55 

July 

66 0 

-34 0 

1,156 OO 

-1.53 

August 

66 6 

-33.4 

1,115 56 

-1,51 

September 

65.6 

-34.4 

1,183 36 

-1.55 

October 

64 4 

-35.6 

1,267 36 

-1.61 

November . . 

65.1 

-34.9 

1,218 01 

-1.68 

December . . . 



... 

Total 


-3,885.8 

173,828.54 

... 


Source* Table 128 


verifies the fact that each series now has about the same amplitude. This 
type of chart is frequently referred to as a cycle chart, since its object is 
to facilitate comparison of cycles. It is easily observable from this chart 
that the turning points of pig iron typically occur before those of magazine 
advertising. The charts of course, does not imply that pig iron produc- 
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tion results in magazine advertising; in itseE the chart gives no clue to 
the causal relationship. 


Direct Method 

The residual method is rather laborious E the sole object of the analysis 
is to isolate cycles. A simpler and more direct method is desirable, but 
unfortunately no direct method has yet been devised which accomplishes 
this result with any great degree of perfection. 

For rough and ready reference, Chart 201, on which the unadjusted 



4AN FEB MAe APR MAY JUN JUL AUG SEP OCT NOV DEC 

Chart 201. United States Magazine Advertising, hy Months, 1932-1937. (Data of 

Table 109, Col. 2.) 

data have been plotted, has much to commend it. Although little idea 
of the trend can be obtained (since E any considerable number of years 
were included, the crossing and re-crossing of the lines would be very 
confusing), the seasonal movement is unmistakable. More important for 
our purposes, a rough idea of the cycles can be obtained by comparing the 
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level and slope of any line with those preceding and following it in point 
of time. Thus the latter half of 1932 and the first half of 1933 appear as 
depression periods, and there appears also a minor recession in the middle 
of 1935. 

But a general purpose chart, such as Chart 201, though it gives a rough 
idea of several factors, fails to give a very precise idea of any one. A more 
exact comparison can be made by the use of a little simple arithmetic. 
Thus each month might be expressed as a percentage of the corresponding 
month in the preceding year.^ The procedure, of course, is to divide each 
January value by the January value for the year before; likewise for Febn 
ruary; and so on. This procedure roughly eliminates seasonal variation 
and secular trend, though the general level of the percentages will be above 
100 if the trend is upward, and below 100 if the trend is downward. The 
results of this procedure are shown in section A of Chart 202. The ^^cycles’^ 
are thrown into clear relief, but they are not the same sort of fluctuations 
with which we are familiar. They represent not the cyclical level, but the 
cyclical change. Thus the 1934 value is very high, though the usual cycli- 
cal analysis (as shown by section B of this chart) indicates that in every 
month of that year advertising was at least 40 per cent below normal. 
The explanation is that 1933 was a year of extreme depression, and 1934 
was high in com'parisonl Although the same movements may be detected 
in sections A and B of this chart, they take an unusual form in section A, 
and the mental readjustment which must be made in interpreting the 
cycles of this section makes the procedure of doubtful utility to most per- 
sons and for most purposes. Another defect of this procedure is that 
irregularities in the data are magnified. For instance, the conjuncture of 
a minor irregular high in the given month and a minor irregular low in the 
corresponding month of the preceding year will produce rather a large 
positive variation in the derived series for the given month. 

A variation of this method which gives improved results is to express 
the data as a percentage of the average of the corresponding month for 
several of the preceding years. In section C of Chart 202, the three pre- 
ceding years are used. The number should coincide with the average 
length of the cycle in the series under consideration. As can be seen from 
the chart, this method gives results more like those of the orthodox method 
of section B than does the method shown in section A. Probably the chief 
objection to this method is that cycles are not uniform in duration or 
amplitude, and rather serious distortion of the data still results. There- 

^ This method was devised by M. A. Brumbaugh. See his Direct Method of Determin' 
ing Cyclical Fluctuations of Economic Data Prentice-Hall, Inc., New York, 1926. 
Brumbaugh also adjusts for the small residue of trend which is left after the first senes 
of divisions. 
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fore, if accurate results are required, the residual method is to be recom- 
mended in preference to the direct method. 

Harmonic Analysis 

After obtaining cyclical-irregular movements, an alternative method to 
smoothing by a moving average, or in addition to such a procedure, is to 





Chart 202. United States Magazine Advertising Cyclical-Irregular Movements Com- 
puted by Three Methods* 1921-1937* A. Per Cent of Corresponding Month in Pre- 
ceding Year; B. Per Cent of “Normar^; C. Per Cent of Average of Corresponding 
Month in Three Preening Years. (For ori^al data see Table 109, Col. 2; for cyclical' 
irregular movements by residual method see Table 128.) 
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fit a mathematical curve to the data. The simple procedure here will 
seldom be found appropriate, since cycles seldom exhibit a simple period- 
icity, but the method is to be regarded as an introduction to more complex 
methods of the same general nature. Non-acetate rayon deliveries, how- 
ever, seem to exhibit a fairly regular 2-year cycle, and hence these data 
were selected for purposes of illustration. According to Stanley B. Hunt, 
editor of Rayon Organon, “the chief cause of this textile production cycle 
apparently is a periodic fluctuation in the stocks or inventories of goods 



1923 1924 1923 1926 t927 1926 1929 1930 1931 1932 1933 1934 1935 <936 

Chart 203. Cyclical Movements of Non-Acetate Rayon Deliveries, and Sine-Cosme 
Curve. (Data of Table 131 and 133.) 

held at all stages of production and distribution. These data were ad- 
justed for trend and seasonal, and, on account of violently irregular move- 
ments, were partially smoothed by a 5-month moving average, weighted 
1, 2, 4, 2, 1. These data are shown in Table 131 and Chart 203 (solid line). 

The procedure falls into two steps. (1) By means of 'periodogram an- 
alysis an attempt is made to discover the periodicity of the data and its 
average cyclical pattern. (2) A periodic curve is fitted to the average 
pattern and applied to the series being analyzed. 

Periodogram analysis. Our first objective is to discover the periodicity 
of the rayon data. Let us assume that the periodicity is 24 months. In 
Table 132 the data of Table 131 through December 1934 are arranged in 
rows of 24 successive items. (December 1934 was arbitrarily chosen as a 
terminating point for this illustration since the data in the following years 
do not conform so well to the general 2-year pattern). The average 
each colunm is now taken.^ The highest average is 122.5, and is for the 

3 See Textile Economic Bureau, Inc., Baym Orgcmon, December 8, 1937, p 171. 

^ The average selected was the arithmetic mean, although a median or modified 
mean might be used. 
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* Partly estimated 

Source: Original data from Textile Economics Bureau,.Ino Rayon Organon^ Special Supplement, January 22, 1937, pp. 20-21. 
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fifth month; whereas the lowest average^ 81.2, is found for the seventeenth 
month. The range from high month to low month is 122.5 — 81.2 = 41 3. 

We now repeat this procedure with an arrangement of 23 successive 
items in a row. The table is not reproduced here, but the reader can 
easily verify that the last number in the first row would be 105.9, which 
refers to November 1924, and that the final item of the entire series (23rd 
item of row 6) would be 88.4, the cyclical relative for June 1934. Actual 
computation gives a range of means of 27.5. Repeating again with a 
25-month period, running through June 1935, we find a range of 37.7. 


RANGE OF 
MEANS 



NUMBER OF MONTHS 

Chart 204. Periodogram of Cyclical Movements of Non-Acetate Rayon DeliverieSr 
1923-1934. (Tor data see below.) 

This procedure, with different assumptions as to periodicity, must be re- 
peated until the statistician is satisfied that he has discovered the true 
periodicity. This is taken as the periodicity that gives the greatest range 
of column means. For nine different trials the results are as follows: 


Assumed periodicity 

Range of 

(months) 

column means 

21 

11.6 

22 

17.7 

23 

27.9 

24 

413 

25 

37 7 

26 

32.1 

27 

29.0 

28 

20.7 

2e 

15.4 
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These results are shown graphically in Chart 204, known as a periodogram. 
The column means of Table 132 supply us with numerical values fo 2 
the average cyclical behavior of rayon consumption. Following the pro- 
cedure adopted in the computation of seasonal indexes, however, we shall 
adjust the values to average 100 per cent. These adjusted data are in the 
last row of Table 132, and are shown by the solid line of Chart 205. 



Chart 205. Average Cyclical Pattern of Non-Acetate Rayon Deliveries, 1923-193^ 
(Solid Line) and Sine-Cosine Curve (Dashed Line). (Data of Table 133 ) 


Fitting a periodic curve. We shall now fit a sine-cosine curve to the ad- 
justed cyclical pattern {Y' values) The curve type is 


where 


Y' + A sin 



H- B cos 



T = the periodicity in months, 
A = y S F' sm ( X) , 
/360 

-6 X cos I y ^ / * 


A further observation concerning this equation is that the range of the 
fitted curve from peak to trough is 2 VA* + 

Since T has been foimd to be 24, and Y' - 100, we may immediately 
write: 


Yc = 100 + A sin (15X)'’ + £ cos (16J)°, 
, „ S[F sin (15X)°] 

12 

„ S[F'cos(15X)“] 

^ 1 rfc • 
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Computations for fitting this curve are found in Table 133. Note that 
the first value of Y is taken as 1 instead of the usual 0. Sines and cosines 
for columns 3 and 4 are read from Appendix L. The Y' values are the 
adjusted values in the last row of Table 132. Substituting in the formulae 
above, we find: 


A = 


B = 


186.23 

12 

58.96 

12 


= 16.619, 
4.9133. 


The equation, then, is Fc = 100 + 16.519 sin (15X)° + 4.9133 cos (16^)°. 
The last three columns of the table are self-explanatory. The highest 
computed value, 116.3, we find in the fifth month, and the lowest, 83.7, 
in the seventeenth m onth. T he ran ge, therefore, is 116.3 — 83.7 = 32.6. 
This checks with 2Va^ ■+■ = 2\/(16.519)2 -f- (4.9133)2 = 2 X 16.3 = 

32.6. 

The computed values of column 10 are shown by the broken line of 
Chart 205. The fit is good from the eleventh month on, though much is 
to be desired in the early part of the curve. The same curve is shown by 
the broken line of Chart 203. Since sin 375° is the same as sin 15°, and 
cos 375° is the same as cos 15°, Yc is the same for X = 25 as for A is 1 
Likewise, Yc when X = 26 is the same as when X ~ 2, and so on. The 
fitted curve therefore repeats itself each 24 months. As can be seen, the 
sine-cosine curve fits reasonably well from 1923 through 1934, but the 
extension through 1937 is unsatisfactory. As is so frequently the case, 
generalizations which are valid historically do not work when extended 
into the future. The reason is that conditions change and causes which 
were important in the past give way to newer and more potent causes. 
Again it might be noticed that time series since about 1933 have become 
less regular in their behavior. Another difi&culty with the simple, though 
somewhat laborious, procedure which has been illustrated, is that the 
amplitude of the fitted curve is smaller than the amplitude of the original 
data. This is because the length of each cycle is not exactly the same, 
and therefore the column of the periodogram table which contains the 
high mean does not contain all the cyclical highs, nor does the column 
with the low mean contain all the cyclical lows. For this reason the 
average pattern (to which the periodic curve is fitted) has smaller ampli- 
tude than the original data. 


Cyclical Averages 

The method of harmonic analysis which has been discussed assumes: 
(1) that cycles are a variety of periodic movement; (2) that they are 
similar in pattern; (3) that the pattern can be described by a mathematical 
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equation. In practice it is found that most economic series are not peri- 
odic, and that it is difficult adequately to describe them by mathematical 
curves. Wesley C. Mitchell, in stud 5 dng business cycles, has come to the 
conclusion that different cycles of a given series are sufficiently alike in 
pattern to justify averaging them together and making a number of meas- 
urements of average behavior. Although MitchelPs method is not widely 
used, the importance of his extensive studies justifies a brief description 
of some of his basic procedures.^ Computations are carried out by the 
National Bureau of Economic Research under MitchelFs direction. 

As a preliminary step Mitchell adjusts the data for seasonal variation 
but not for trend within cycles, since he believes the study of combined 
cycle and intra-cycle trend to be useful. Using the deseasonalized data, 
average patterns are obtained for specific cycles and for reference cycles, as 
will be described in turn. 

Specific cycle analysis. Although no adjustment is made for intra-cycle 
trend, the data are adjusted for trend between cycles (inter-cycle trend) 
before averaging the different cycles. This is done by expressing each 
individual month as a percentage of the average for that cycle. In order 
to do this, it is necessary to break the series into specific cycles each run- 
ning from low to low. No objective procedure is adopted for accomplish- 
ing this, but turning points are selected largely by inspection of a chart 
of deseasonalized data, such as Chart 206. On this chart the peaks and 
troughs selected for the specific cycles of United States magazine adver- 
tising are marked by asterisks. The dates selected are given as follows. 
By definition, revival occurs in the month following the cyclical trough; 
similarly, recession refers to the month following a cyclical peak. 

Dates op Specific Cycles in United States Magazine Advertising 


Cycle Initial revival 

1 December 1918 

2 November 1921 

3 October 1924 

4 February 1928 


Peak 

August 1920 
April 1924 
November 1926 
July 1929 


Trough 
October 1921 
September 1924 
January 1928 
June 1933 


Terminal revival 
November 1921 
October 1924 
February 1928 
July 1933 


Now the average value of the deseasonalized data for each cycle front 
revival through trough is obtained. 


Total of 

Cycle deseasonalized data 

{thousands of lines) 

1 80,037 

2 81,703 

3 113,945 

4 168,781 


Duration 

Average 

of cycle 

value 

ipionths) 

for cycle 

35 

2,287 

35 

2,334 

40 

2,849 

65 

2,597 


® See also Wesley 0. Mitchell and Arthur F. Burns, The National Bureau^ s Measures 
of Cyclical Behcwior, Bulletin 57, July 1, 1935, of the National Bureau of Economic 
Eesearch, New York. 
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Next the deseasonalized data are divided by the average cycle value 
for each cycle and multiplied by 100. The percentages so obtained include 
for each cycle a span from the month preceding initial revival (that is, 
the trough) through the month following the terminal revival.® For in- 
stance, the percentages for the first cycle run from November 1918 through 
December 1921, and for the second cycle from October 1921 through No- 

Stages op Cycle 1, United States Magazine Advbetising 


Stage 

number 

Name of stage 

First 

month 

Month on which 
centered 

Last 

month 

Duration m 
months* 

1 

Initial revival 


Dec 1918 


1 


Period of expansion: 





11 

First third . . . 

Jan. 1919 

Mar.-Apr 1919 

June 1919 

6 

III 

Second third 

July 1919 

Oct 1919 

Jan 1920 

7 

IV 

Last third . . 

Feb 1920 

May 1920 

Aug 1920 

7 

Y 

Recession 


Sept. 1920 


1 


Period of contraction: 





VI 

First third . . . 

Oct. 1920 

Nov -Dec. 1920 

Jan. 1921 

4 

VII 

Second third 

Feb 1921 

Mar -Apr. 1921 

May 1921 

4 

VIII 

Last third 

June 1921 

i 

Aug. 1921 

Oct 1921 

5 

IX 

Terminal revival . . ! 


Nov. 1921 


1 


* Whenever the months in a period are not divisible by three, the adjustment is made in the middle 
stage For this purpose, imtial revival is considered to be a part of the first third of expansion, and re- 
cession a part of the first third of contraction. 


veraber 1924, It is to be noticed tbat October, November, and December, 
1921, are included as the last three months of cycle 1 and the jBnst three 
months of cycle 2. Although there is an overlapping of months between 
cycles, the percentages for these months are different in the two cycles 
because they are based on different averages. Since these data are basic, 
they are shown as Table 134. 

Aiter inter-cycle trend is eliminated as described, the different cycles 
are further made comparable by dividing each into nine stages. The first 
stage is called initial revival^ and is the month following the initial trough. 
The next three stages are equal thirds of the period of expansion^ which 
runs from the month following that of initial revival through the peak. 
The fifth stage, that of recession, is the month following the peak. The 


® These percent^es are necessary in order to obtain averages for the nine stages 
ivhich are defined in the following discussion. Possibly it might have been better, here 
and later, to have had stages I and IX represent the troughs instead of the revival 
months and stage V the peak instead of the recession month; but in order to avoid 
confusion, we are following Mitchell's procedure. 




TABLE 134 

United States Magazine Advertising Specific Cycles, November 1918-Atjgust 1933 

(Seasonally adjusted data expressed as per cent of average for each cycle) 
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Source*. i921-1933, derived from Table 128 For source of original data, see Table 85 
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next three stages are equal thirds of the period of contraction, which runs 
from the month following that of recession through the trough. The last 
stage is called terminal revival, and is the month following the terminal 
trough. In order to clarify these statements, we show, in tabular form, 
on page 564, the mne stages of the first specific cycle of magazine adver- 
tising. 

Note that, although the number of months in the last column totals 36, 
the duration of the cycle, from December 1918 through October 1921, is 
only 35 months. This is a formal discrepancy due to the overlapping of 
the stage of terminal revival of one cycle with the period of initial revival 
of the next. 

A standing or average value for each stage of each cycle is now computed 
from the data of Table 134. Standings for stages I, V, and IX are taken 
as the average of the three months centering respectively on the months 
of initial revival, recession, and terminal revival. This is done in ordei 
to obtain more representative values for these stages. The standings for 
the different stages of the first cycle are found to be as follows: 

Cyclical Pattern op Cycle 1 


Stage 

Standing 

1 

59 6 

II 

80.9 

III 

107.3 

IV 

128 1 

V 

130 7 

VI 

111.2 

VII 

82 9 

VIII 

80 2 

IX 

72 8 


Since the pattern of each cycle is not exactly the same, an average, for all 
cycles, of stage I is obtained, and of stage II, and so on. The data and 
results are shown in Table 135. The averages of this table constitute the 
average pattern of specifi,c cycles in United States magazine advertising. 
As a very rough indication of the reliability of these averages, average 
deviations are shown at the bottom of the table. 

Reference cycle analysis. The object of reference cycle analysis is to 
determine how a specific series behaved, on the average, during cycles in 
general business. The analysis is carried through in precisely the same 
fashion as for specific cycles, except that the dates chosen for revivals, 
peaks, and troughs are those of general business cycles rather than those 
of the specific series being analyzed. These reference cycle dates are estab- 
lished subjectively after exanodnation of the material m Thorpes Business 
Annals (National Bureau of Economic Research, New York, 1926) and 
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various statistical series. 

Post-war reference 

cycle dates, as 

established 

by Mitchell, are as follows: 






Duration 

Cycle Remval 

Peak 

Trough 

(months) 

1 May 1919 

January 1920 

September 1921 

29 

2 October 1921 

May 1923 

July 1924 

34 

3 August 1924 

October 1926 

December 1927 

41 

4 January 1928 

June 1929 

March 1933 

63 

Analysis along the lines indicated gives the following results 

: 

Stage 

Average 

Average 


value 

deviation 


I 

90 0 

10 2 


11 

94 6 

no 


m 

1010 

82 


IV 

108 4 

88 


V 

115 8 

69 


VI 

114.9 

54 


VII 

104.3 

56 


VIII 

91.4 

20 6 


IX 

83.9 

219 



Comparison of reference and specific cycles. From the measures al- 
ready illustrated or described a number of interesting comparisons can be 
made. Many of these may most easily be seen by study of a chart, such 
as Chart 207. The horizontal scale represents time, while the vertical is 
per cent of average. First, notice that solid lines refer to the reference 
cycle pattern, while broken lines refer to the specific cycle pattern. If the 
turning point of each specific cycle coincided with the reference cycle 
dates, the two curves would be exactly the same. Variation between the 
two sets of dates produces two results: (1) variation in the pattern of the 
two curves; (2) smaller amplitude in the reference cycle pattern. In the 
present instance the conformity between the two series is high. (Among 
other measures that Mitchell makes are indexes of conformity, which will 
not be described here.) This indicates that magazine advertising is closely 
related to general business activity, as cause or effect, or in some other 
fashion. 

The chart also indicates that magazine advertising tends to lag behind 
general business in its turning points. The facts concerning lag or lead 
of specific cycles with respect to the reference cycle are obtained by com- 
paring the dates shown on page 562 and above, and summarized on page 
669. Thus in Chart 207 the observation representing the first stage (initial 
revival) of the specific cycle is placed one-fourth of a month to the left 
of that representing the same stage of the refer snce cycle, while the 
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4 . SPECIFIC CYCLES, 1918-1933 

4 REFERENCE CYCLES, 1919-1933 

Cliart 207. Cyclical Pattern of XTnited States Magazine Advertising, 1918-1933. 
For specific cycle pattern see Table 135; for further explanation read pp, 568-57a^ 
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fifth stage (recession) and the ninth stage (terminal revival) show the 
reference cycle further to the left by 5 and 1.75 months respectively. Not 
much reliance can be placed on the lead of magazine advertising at initial 
revival, however, since the negative average is due entirely to the lead of 
advertising one time out of four. The other two averages are probably 
significant; this belief is indicated by the two arrows pointing left on the 
chart. The length of the arrows indicates the number of months^ lead, 
and the direction that the arrows point tells which leads (the specific cycle 
if the arrow points right; general business if the arrow points left). 

Another interesting feature of this chart is the vertical lines at the top 
and bottom of the chart. These are on the same scale as the main part 

Lead (— ) on Lag (+) of Specific Cycle at Turning 
Points of Reference Cycle, in Months 

(When the average is negative, the standing of the specific cycle at re- 
vival or recession is plotted to the left of that reference cycle stage in 
Chart 2C7, when it is positive, the specific cycle standing is plotted to the 
right ) 


Cycle 

At initial 
reference 
trough 

At reference 
peak 

At terminal 
reference 
trough 

1 

-5 

+7 

4-1 

2 

+1 

-+-11 

+2 

3 

+2 

+1 

+1 

4 

+1 

+1 

+3 

Average 

-0.2S 

5.00 

1.75 


of the chart, and indicate the average deviation of the standings for the 
different stages. These average deviations and the averages to which they 
refer are placed in a position along the horizontal scale proportional to 
the time elapsed from the center of one stage to the center of another. 
Finally it should be noted that the duration of the cycle is indicated by 
the length of the long horizontal line above the chart in the case of the 
specific cycle, and below it in the case of the reference cycle. These hori- 
zontal lines form the base lines with which the vertical lines are connected. 
The shorter horizontal lines paralleling the ones described refer to the 
average deviation of the duration of the cycles. 

In connection with this type of analysis, still other measures of cyclical 
behavior are computed at the National Bpreau of Economic Research 
For instance, there are measures of cyclical amplitude, of percentage growth 
between cycles, of change from stage to stage. Space does not permit a 
discussion of these measures, but the reader is referred to the National 
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Biireau of Economic Research Bulletin 57 (July 1, 1935), The National 
Bureau^ s Measures of Cyclical Behavior, by Wesley C. Mitchell and Arthur 
F. Burns; and to mimeographed editions of the first three chapters of 
Mitchell’s forthcoming book, Business Cycles, Volume II, Analysis of 
Cyclical Behavior. 

Several methods of cycle analysis have been presented in this chapter. 
They each have strong points and weak points. The orthodox method, 
which isolates cycles by means of adjusting for all other types of move' 
ments, is probably on the whole the most satisfactory. Direct methods, 
though less laborious than this method, are difficult to interpret. The 
fitting of periodic curves is an attempt to generalize concerning cyclical 
behavior, but the regularities which it attempts to measure are not usually 
existent. The method of averages also attempts to generalize concerning 
cycles. Although it does not assume the same degree or kind of regularity, 
nevertheless it is not completely satisfactory. We are not certain that 
differences among cycles over a period of time are largely accidental. 
Also, the average deviations of Chart 207 are rather large, and, with only 
four cycles as observations, the averages plotted could not under any cir- 
cumstances be considered very reliable. To the extent that cyclical be- 
havior gradually changes, however, it might be possible to modify the 
method so as to give some idea of such trend. 
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CHAPTER XX 


FUNDAMENTALS IN INDEX NUMBER 
CONSTRUCTION 


Meaning and Uses of Index Niimbers 

Index numbers are devices for measuring differences in the magnitude 
of a group of related variables. These differences may have to do with 
the price of commodities, the physical quantity of goods produced or mark- 
eted, or such concepts as ^^intelligence, ^^beauty,^^ or ^^efl&ciency/^ The 
comparisons may be between periods of time; between places; between like 
categories, such as persons, schools, or objects. Thus we may have index 
numbers comparing the cost of living at different times or in different 
countries, the physical volume of production in different years, or the 
efficiency of different school systems. A few uses to which index numbers 
are put are described below. 

(1) Perhaps the most common type of index is that of the change in 
price level over a period of time. One use of such index numbers, with 
which the reader is already familiar, is that of deflating a value series in 
order to convert it into physical terms. Referring back to Chapter XIY, 
Table 79, we find that hourly wages were reduced to hourly real wages by 
dividing by an index of the cost of living. Similarly, we might wish to 
convert a time series representing value of construction contracts awarded 
to a physical basis by deflating with an index of construction costs. 

(2) Price movements may be studied in order to discover their cause, 
or their effect on the economic community. In order to study such eco- 
nomic relationships, it is customary to compare changes in the price level 
with changes in other series, such as gold, bank reserves, bank deposits, 
bank debits, and the physical volume of production. Such studies may 
involve, not only the average change in price relatives, but also : (a) dis- 
persion of price relatives; (b) shape of frequency distributions of price 
relatives; (c) alterations in the relative positions of such percentages (dis- 
placement of prices); (d) magnitude of change in price with changes in 
quantity offered for sale; (e) magnitude of changes in purchases or pro- 
duction with changes in price (elasticity of demand or supply); (f) fre- 
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quency with which different prices change; (g) naagnitude of price changes 
with changes in demand.^ 

(3) Changes in the price level may be measured in order to control 
them. Thus the increase in official price of gold in 1933-1934 was in part 
an attempt to raise the price level. If index numbers showed the price 
level to be higher after the price of gold was raised, this result might be 
taken as an indication that the gold policy was effective. 

Occasionally, governmental influence is exercised not to raise, lower, or 
stabilize the price level, but to raise one group of prices relative to another. 
Thus the United States Government has considered various devices, and 
tried some, to raise agricultural prices to a ^^parity^' with industrial prices. 

(4) Occasionally a contract is made in such a way that the effects of 
changes in the purchasing power of the dollar are minimized. Thus the 
Philadelphia Rapid Transit Company agreed in 1926 to adjust wages an- 
nually in such a way that, regardless of price changes, the pay envelope 
would always support the same standard of living. Obviously, then, it 
was necessary to construct an index of the cost of living of their employees 
in order to determine changes in wage rates. 

(5) Closely related to the use just mentioned is that of estimating for 
rate-making purposes the reproduction cost of utilities. It is very labori- 
ous to re-appraise properties at frequent intervals; but, once having arrived 
at a satisfactory valuation as of a particular time, it is a simple matter to 
revise the valuation at frequent intervals in accordance with changes in 
the level of a price or cost of production index. The legal status of index 
numbers for such purposes is in doubt; however, the United States Supreme 
Court has held that, if index numbers are to be so used, the particular 
index employed must be one that is appropriate to the particular purpose 
in view. An index of ^^the general price level, for instance, is clearly 
inappropriate.^ 

(6) Illustrations of average price comparisons among different regions 
are not common. It is very difficult to make such comparisons since the 
relative importance of goods produced and/or consumed in the different 


^ Much careful study along all of these lines except elasticity of demand has been 
done by Frederick 0. MiUs. The results have been published by the National Bureau 
of Economic Research in The Behavior of Prices (1927). The idea of elasticity has been 
developed by Cournot, Alfred Marshall, and Henry L. Moore. Henry Schultz has 
contributed greatly to the statistical measurement of elasticity of demand and supply 
See his book, The Theory and Measurement of Demand, the University of Chicago Press, 
Chicago, 1938. Magnitude and frequency of price change have been studied statis- 
ticaEy by Gardiner C. Means and published m Industrial Prices and Their Relative 
Inflexibility, Senate Document No. 13, 74th Congress, 1st Session. 

2 See “hidex Numbers and Public tJtility Valuation,^' by Robert W. Harbeson, 
Journal of the American Statistical AssodaMon, Vol. 31, J’une 1936, pp. 245-257. 
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places varies so widely. However, the National Industrial Conference 
Board has compiled an index of the cost of living in 1927 in twelve in* 
dustrial cities, with the object of comparing the ^^differences in the cost of 
maintaining an established standard of living^ ^ between different regions 
and between different cities in the same region. 

(7) There are several organizations that compile indexes comparing 
physical changes over a period of time. These relate to the physical vol- 
ume of trade, industrial production, factory production, sales, stocks of 
goods, etc. We have already used such indexes in our analysis of time 
series. They are extremely useful for the historical study of secular trends, 
seasonal variations, and business cycles, and are indispensable for persons 
who wish to keep abreast of current business conditions. 

(8) Forecasting indexes are compiled by most forecasting organiza- 
tions. Although many of the indexes seem sound in theory, and in prac- 
tice when applied to periods before they were actually used, unfortunately 
most of them do not work when put to current use. It is also not uncom- 
mon to find that a forecasting index works satisfactorily during periods of 
mild prosperity and depression but fails during a severe depression. Fore- 
casting is discussed more fully in Chapter XXV. 

(9) It would appear from the above discussion that most indexes are 
price indexes. Historically they have been in use longer, and currently 
they are probably the most numerous. Quantity indexes are much more 
important than the amount of space devoted to them in paragraph 7, 
above, would indicate. Other varieties of indexes are diverse in nature 
and few in number. As an illustration of one type may be mentioned an 
index of school efficiency. Following the pioneer work of Leonard P. 
Ayres, who in 1920 published index numbers of the rating of state school 
systems, a number of similar studies have been undertaken.^ Among the 
factors most commonly combined in the general index are: (a) school days 
per year; (b) per cent of school population attending schools daily; (c) 
ratio of high school enrollment to total enrollment; (d) average expend! 
ture per pupil in average daily attendance; (e) average expenditure per 
pupil for purposes other than salaries; (f) average salary of teachers. 

An index number is obtained by combining a number of variables by 
means of a total or an average. This statement will be clarified by refer- 
ence to Table 136. In column 2 is a single price series of common build- 
ing brick, and in column 3 is a series of relatives based upon these prices 
In column 4, however, there is a series of index numbers based on various 
kinds of brick and varieties of tile, which may be referred to collectively 

® For an analysis and bibliography of these studies, together with a brief description 
of several, see ^‘Estimating State School Efficiency,^ ^ Research Bulletin of the NaMmd 
Education Assodation, VoL X, No. 3, May 1982, especially pp. 104r-112. 
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as a price index. These index numbers may be constructed by combining 
year by year, in a manner which will be described, common building brick 
prices and prices of other commodities in the brick and tile group. In 
column 3 the brick prices are expressed relative to 1926 as 100. Such a 
series is a series of relatives — yrice relatives in this case. Index numbers 
can be constructed also by averaging the price relatives of each year sep- 

TABLE 136 

Peicb and Price Relatives of Common Building Brick, and 
Brick and Tile Price Index, 1926-1937 


Yeai 

(1) 

Common building brick 

Brick and tile 
index number 
(1926 - 100) 

(4) 

! Price 

per 1,000 
(2) 

Price relative 
to 1926 
(3) 

1926 

$13,913 

100 0 

100 0 

1927 

14.024 

100 8 

95 7 

1928 

13.718 

98 6 : 

95 6 

1929 

13 621 

97 9 

94.3 

1930 

13 050 

93 8 

89 8 

1931 

12 396 

89.1 

83 6 

1932 

11 214 

80 6 

77 3 

1933 

11 047 

79 4 

79 2 

1934 

12.591 

90 5 

90.2 

1935 

12 341 

88.7 

89 4 

1936 

12 313 

88.5 

88 7 

1937 

12 647 

90 9 

93.5 


Source: United t -v-.. W^'olc^ale Prices, Bulletins 

of vanous years ' t 7 ’* sec “aource rote of Table 13S. 

arately. The first method is usually referred to as the aggregative method^ 
while the second is that of averaging price relatives. These explanations 
will become clearer as they are developed more fuUy. 

Problems in the Construction of Index Numbers 

Among the problems which the statistician encounters in index number 
construction are: 

(1) Selection of series for inclusion in index. 

(2) Selection of source of data. 

(3) Selection of base. 

(4) Method of combining data. 

(5) System of weighting. 

Not all of these problems are of equal importance, nor are they always 
independent of one another. Thus a simple system of weighting would 
require a diSerent, and usually larger, list of commodities for a price index 
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than a method that employs a separate weighting system for each 

subgroup of an index. Likewise, as will be explained later, the weighting 
system to use depends in part upon the method of combining the data. 
It is convenient to include both the method and the system of weighting 
in one formula, and to discuss both points in the same section. Likewise, 
problems 1 and 2, noted above, will be considered together. A more com- 
plete xmderstanding of these points will result if the behavior of price 
relatives is considered first. 

An Illustration of the Behavior of Price Relatives 

The United States Bureau of Labor Statistics at the present time com- 
piles an index of wholesale prices consisting of 813 separate commodities 
or series. It also computes price relatives for each of these series. From 
these price relatives frequency distributions have been made, and deciles 
subsequently computed for July of each year from 1926 to date.^ In 
Chart 208 the first, third, fifth, seventh, and ninth deciles are shown, 
(The fifth decile is, of course, the median.) 

First, it should be noticed that there is an evidence of central tendency 
among the movements of the price relatives, as evidenced by the fact that 
the central bands are generally narrower than those at the top and bottom. 
This suggests that the movements of prices are not entirely random, but 
are boimd together by some underlying force of a monetary or other 
nature. Possibly this force is only the interdependency of prices, of which 
the economists speak. While this central tendency is marked in most 
years, it is not so clear cut in some years as in others. For instance, in 
1932 the tendency is not at all apparent. 

Chart 209 suggests a possible explanation of this situation. In section A 
are plotted indexes of farm products, foods, and all commodities other than 
farm products and foods. The price of farm products dropped greatly 
during the depression, the price of foods somewhat less, while other com- 
modities remained fairly stable in price. Not only is agriculture composed 
of a large number of small scale farmers and total farm production not very 
responsive to price, but also the demand for farm products is inelastic. 
Consequently, as our export market diminished during the depression and 
as domestic demand fell off, it was necessary to lower the prices consider- 
ably in order to sell the crops. Section B of this chart, the indexes of 


^ ^ Distributions by months, July 1927-July 1936 for all commodities have been com- 
piled by Leonard Ascher. See “Variations in Price Relative Distributions, 1927 to 
1936^' by Leonard Ascher, Journal of the American StatisUcal Association, Vol. 32, 
No. 198, pp. 271-280. The deciles were computed by the writers from his July dis^ 
tributions, The deciles for building materials appearing in Chart 210 were also oom-* 
puted by the writers and refer to average prices for the entire year. 
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which overlap considerably those of section A, shows the same general 
picture. Although the demand for many finished products fluctuates tre^ 
mendously with the business cycle, the demand at a given time is elastic, 
and a smaller price reduction is needed to stimulate buying. Of more im- 
portance is the fact that many manufacturing industries are composed of 
large scale enterprises, which are able, acting individually or collectively, 
to restrict the output and to maintain price. 

'^ATiatever the economic factors involved, we are forced to the conclu- 


PER CENT 



sion that a frec^uency distribution of price relatives is not homogeneous 
m character. It is a compound of separate frequency distributions, eacl 
with a characteristic mode. Under conditions of extreme economic dis 
location these different modes draw far apart, with the result that the 
compound distribution is flat topped, as is the case in 1932. As the dif- 
ferent elements draw more nearly into the same relationship that existed 
in the base period, the different modes draw closer together and the central 
tendency becomes more marked. (This does not mean that the price 
system has been brought into equilibrium, a condition which wonld exist 
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if no one any longer were to make changes in his rate or method of pro- 
duction, his schedule of purchases, or his price bids or offers.) 

A second point to observe is the dispersion. It tends to become greater 
as the distance from the base period increases, although this tendency is 
coimteracted to some extent as the median value approaches that of the 
base period. Chart 209 shows that by 1936 the 1926 price relationships 
had been partially reestablished but at a lower level. Perhaps if the 1926 

PER CENT 



PER CENT 



1926 ’27 ’28 '29 30 ’31 ’32 *33 ’34 '35 »36 1937 


B 

Chart 209. Major Subdivisions of United States Bureau of Labor Statistics Index ot* 
Wliolesale Prices, 1926-1937. (For source of data see Table 136.) 
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price level were to be reached, the 1926 price relationships would again be 
destroyed. There is also the possibility, not clearly shown by Chart 208, 
that dispersion may vary with the different phases of the business cycle. 

Still a third point to notice is the shape of the distributions of Chart 
208. During 1927 and 1928 the skevmess (judged from the relative posi- 
tions of the deciles) appears to be positive. Many persons are of the 
opinion that this is an inherent characteristic of frequency distributions 
of price relatives, since they can increase indefimtely, while a selling price 
can decline only to zero. On the other hand, it may be suggested that 
price relatives are dominated more by the laws of economics than those of 
mathematics The limits of price advances or price declines are certainly 
influenced by the willingness of persons to buy at different prices. Fui- 
thermore, the direction of price change has something to do with the sign 
of the skewness. Beginning with 1929 the skewness begins to be negative, 
and the price level from this year on is definitely below that of 1926. The 
explanation may be that price changes are to a great extent a result of 
sensitive competitive price, changing to a varying extent, while managed 
prices tend to be sluggish. 

In this chapter, building material prices have been chosen as material 
to illustrate index number construction, and the price quotations used are 
from those compiled by the United States Bureau of Labor Statistics. 
There are seven subgroups of building material prices: (1) brick and tile: 
(2) cement; (3) lumber; (4) paint and paint materials; (5) plumbing and 
heating; (6) structural steel; (7) other building materials. 

From what has already been said, we should expect building material 
price relatives to form frequency distributions with characteristic central 
tendencies. Some of the factors which relate building prices to each other 
are as follows : Some building materials are complementary products, such 
as sand and cement; others are substitutes for each other, as brick and 
lumber for house exteriors; still others are joint products, as sand and 
gravel. In some cases a change in the price of one commodity affects 
another in the same direction, in other cases in an opposite direction. On 
the whole it seems that the forces making for uniformity of price move- 
ment are much stronger than those making for diversity. Not only should 
we expect a central tendency for building materials different from that of 
commodity prices as a whole, but we should expect less variation among 
the price relatives. 

In general these expectations are fulfilled. Thus we find that the median 
price relative of the 102 building materials in 1932 is 74.6, as compared 
with 66.2 for all commodities (as represented by the commodities in the 
United States Bureau of Labor Statistics wholesale price index). Further- 
more, the median of each of the building material subgroups in this year 
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is higher than 66.2, with the exception of lumber, which is 64.1. Likewise, 
we find the spread between the first and ninth deciles to be much smaller 
for the building materials group and for each of its subgroups than for all 
commodities, except in the case of paint and paint materials. Although 
we find the same tendencies in regard to skewness for building materials 
that we do for commodities in general, we find the distribution more 
peaked — ^that is, with a more pronounced central tendency. Though most 
of the separate subgroups contain too few items to yield readily much 
information concerning the shape of the distribution of their price relatives, 

PER CENT 



Chart 210. Deciles of Price Relatives of Paint and Paint Materials and Price 
Relative of Outside Gloss White Paint (Dotted Line), 1926-1937. (For source of data 
see Table 136. Prices are average prices for each year.) 

the case of paint and paint materials is interesting. As may be seen by- 
inspection of Chart 210, the range between the fifth and seventh deciles 
is very narrow, indicating a high degree of uniformity in movement for 
the most closely bimched fifth of the commodities. It is interesting to 
note also that until 1931 the seventh decile remained at 100. In fact, 
there were four of the 29 commodities which did not change in price until 
1931, and one (bone black) that is stOl at its 1926 level. Although the 
prices of a considerable proportion of tne commodities seem to be rather 
rigid, there are many which are very sensitive in their fl-uctuations. This 
is indicated on the chart by the wide variation from the median of the 
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first and ninth deciles. As a striking illustration we find shellac, which 
rose to 138.6 per cent in 1927 and fell to 30.2 per cent in 1932. As in the 
case of the all-commodity distribution, we find the paint and paint mate- 
rials distribution to be negatively skewed during the depression years. 

Data for Index Numbers 

Although the method of combining the variables is of considerable im- 
portance in constructing index numbers, it is insignificant when compared 
with the problem of selecting the data that are the raw materials of the 
index. Too much emphasis cannot be put upon this point. The data 
must be accurate and comparable, and the sample representative. A 
sample cannot be expected to be representative unless an adequate num- 
ber of items is included. To state the idea in other language: a sufficiently 
large sample of relevant items must be selected to obtain reliable index 
numbers. 

Needless to say, the commodities to be chosen for a price index, and the 
type of quotation to be selected, depend on what is being measured. A 
wholesale price index requires wholesale prices. A cost-of-living index 
necessitates not only retail prices of food, but rents, gas and electric rates, 
etc., applying to the class of persons for whom the cost of living is to be 
ascertained. An index of the changing cost of constructing frame houses 
in Atlanta, Georgia, should include those materials and items of labor 
that are used in frame houses built in Atlanta, and the prices should be 
the Atlanta prices of those materials, or the wages in Atlanta of the kind 
of labor used. 

When selecting the sources of data for index numbers, we may rely on 
regularly published quotations or obtain periodic special reports from the 
merchants, producers, exporters, or others who possess the basic informa- 
tion needed. Under either circumstance we must make sure that the 
data pertain strictly to the thing being measured. Thus, if retail price 
changes are being measured, a quotation might be from a market place, 
an independent store, a department store, a mail order retail store, a 
manufacturer's outlet, etc. These different sources should not be mixed 
indiscriminately. Neither should first of the month quotations, middle 
of the month quotations, and end of the month quotations ordinarily 
be combined in one index. 

The discussion immediately following is in part an application of prin- 
ciples discussed in earlier chapters of this book, especially Chapters II 
and XII.^ The great importance of the proper choice of data for index 
numbers justifies a bringing together of these principles, even though slight 
duplication is involved. 
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Accuracy. Some of the statistical data that appear in precise printed 
form cannot be depended upon. If the person or company reporting the 
data uses the data, they are likely to be accurate; but if the data are 
merely statistical reports furnished to an outside agency, they may be 
compiled originally by careless and indifferent clerks whose sole interest 
is in filling the form with mk marks as quickly as possible. It therefore 
behooves the statistician to ascertain how the data are collected, and to 
select his source with discrimination. 

Comparability. Standard grades of the same commodity are, of course, 
comparable between different dates; however, a 1908 automobile cannot 
be compared with a 1938 automobile. The Automobile Manufacturers 
Association compiles an index of the price per pound of automobiles ! If 
the object is to compare the amount of money that must be spent in dif- 
ferent years to purchase an automobile which provides an equal amount 
of “utility’’ each year, the above index has a decided upward bias, since 
there has been a gradual increase in utility per pound. Nor is it easy to 
see how the price of a “standard” automobile could be computed for dif- 
ferent years, since in not more than one year could such a standard auto- 
mobile ordinarily be found The upward bias of price quotations is great- 
est in the case of highly processed manufactured goods; it is present also 
in the case even of some agricultural commodities. It is likely, therefore, 
that most price index numbers have an upward bias. 

A similar problem arises when one article passes out of wide use and its 
place is taken by a different commodity serving somewhat the same use. 
For instance, the stagecoach of 100 years ago has been superseded by the 
streamlined air-conditioned train. If we should find that the fare from 
Washington, D. C , to Philadelphia were the same in the two periods, we 
should not conclude that the cost of the same service had remained the 
same. Think of the saving of time required to make the trip, and the 
added comfort of travel provided by the modern streamlined, air-conditioned 
train. This problem, however, is not so diflSicult as the gradual change in 
quality of the same product. For in the period between 1830 and 1930 
there was a time when both the stagecoach and the train were in use; at 
that time, change in cost of transportation could be measured and a sub- 
stitution of trains for stagecoaches could be made for subsequent com- 
parisons. But how should we compare the price of amusements in one 
country, which goes in for pulque and bull fights on a large scale, with 
that of another country which indulges mainly in beer and Wagner operas? 

Representativeness. Since index numbers are usually obtained from 
samples, we must try to obtain a sample that behaves like the population 
from which it is drawn. Probably the most satisfactory way of accom-' 
plishing this is to divide the original data into groups and subgroups 
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aaid to draw a representative sample from each of these. Certain statis- 
tical tests (which are described in this section) may be applied to the 
entire sample so selected, to determine its representative character as a 
whole. 

As previously stated, we should expect different groups of commodities 
affected by different economic factors to display characteristic patterns of 
behavior. For example, we should expect price (and quantity) move- 
ments of foods to be different from those of building materials. The de- 
mand for food products is inelastic, while that for building materials (classi- 
fied as durable goods, the purchase of which can be postponed) is elastic. 
On the other hand, the supply of foods over a short period of time is de- 
pendent to a considerable extent on the weather, while the supply of build- 
ing materials is subject to conscious control of the fabricators. Likewise, 
we should expect the price fluctuations of any group to be distinctive and 
their average movements to differ from those of all commodities taken as 
a whole. So also we should expect each subgroup of a particular group 
(such as the paint and paint materials subgroup of the building materials 
group) to exhibit characteristic fluctuations. Furthermore, the statistical 
data presented earlier in this chapter indicate that the facts are in accord- 
ance with the theory. 

For the building materials illustration running through this chapter, 
seven commodities have been selected, one for each of the seven subgroups 
of which building materials are comprised. (Of course, selection of only 
one commodity for each subgroup is a great oversimplification of the prob- 
lem, and is for illustrative purposes only.) The different groups and the 
commodity representative for each, together with the unit of price quota- 
tion for each, is as follows: 



Subgroup 

Representative 

Unit 

Place of 
quotation 

1. 

Brick and tile 

. Common bxiilding 





brick 

1,000 

Plant 

2. 

Cement 

. Portland cement 

barrel 

Plant 

3. 

Lumber . . 

. Hard maple, No 1 

1,000 board feet 

Chicago 

4. 

Paint and paint materials . . 

. Outside white gloss 





house paint 

gallon 

Plant 

5. 

Plumbing and beating .... 

. Lavatories 

each 

Factory 

6. 

Structural steel 

. Structural steel 

100 pounds 

Mill 

7. 

Other building materials . . 

. Building gravel 

ton 

Plant 


In selecting the representative commodity for each subgroup, the object 
was primarily to choose that commodity the price behavior of which was 
fairly representative of the subgroup from which it was selected. Thus 
iu. Chart 210, the dotted line representing outside white gloss house paint 
is seen to stay rather close in most years to the narrow band which en- 
closes the typical items in the subgroup. It cannot be claimed that it is 
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feasible to select one commodity which will adequately represent a group, 
but if it is possible to select only a limited number of itenas, preference 
should be given to those that conform most closely to the central tendency 
of the group. 

Having selected commodities that are, individually or collectively, fairly 
representative of the group from which they were selected, it remains to 
be seen whether proportionate representation is obtained for each group. 
Table 137 indicates the extent to which each subgroup is represented in 


TABLE 137 

Ratio op Sample Value to Population Value of Each Subgkoutp of Building 
Mateeial Commodities, 1926 


Subgroup 

Thousands of dollars 

Per cent of total 

Ratio of 
sample to 
population 
(per cent) 

Population 

Sample 

Population 

Sample 

Brick and tile .... 

380,031 

103,286 

8.5 

14 2 

27.2 

Cement ... 

260,803 

260,803 1 

59 

35 9 

100 0 

Lumber . . , 

1,358,705 

49,104 

30 5 

68 

3.6 

Paint and paint materials 

634,869 

87,746 

14 3 

12 1 

13 8 

Plumbing and heating 

281,213 

21,927 

63 

30 

• 7.8 

Structural steel 

148,868 

148,868 

33 

! 20 5 

‘ 100.0 

Other building materials 

1,390,395 

54,386 

31.2 

7.5 

39 

Total . 

4,454,884 

726,120 

100 0 

100 0 

16 3 


Source See Table 136 


the sample as compared with the population. Extent of repjfssentation 
is measured by the value marketed in 1926, the base year. 

It appears that the representation of brick and tile, of cement, and of 
structural steel should be reduced, and that of the other four increased. 
It would be a simple matter to select additional commodities for these 
four subgroups; however, there is a difficulty in reducing the importance 
of cement and structural steel, each of which are represented 100 per cent. 
Since it is ordinarily impracticable to have each sample group co-extensive 
with the entire population, the remedy must be found in the weighting of 
the commodities. 

A further test of the representativeness of the sample can sometimes be 
applied: Do the value changes of the sample coincide with those of the 
population? This test should be applied not only to the whole sample, 
but to the various groups and subgroups into which it is divided.^ 

® This test is similar to Irving Fisher’s “total value cnterion,” which states that the 
price index multiplied by the quantity index should equal the ratio of change' of the 
total value of the population. See Irving Fisher, “The Total Value Criterion,” Journcu 
of the American Statistical AssodoMon, VoL XXII, December 1927, pp. 419-441. 
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Adequacy* It has been shown that the reliability of a mean of a random 
sample increases with the square root of the number of items included. 
Likewise, the larger the proportion of items included, the more reliable is 
the mean.® It would appear, then, that we should ordinarily select some 
of the more important items first, and as many other suitable items as 
resources will permit. The absolute number of items to use is a question 
which cannot be answered in general terms. 

Selection of Base 

Regardless of the formula employed for weighting and combining the 
data, it is customary (although not necessary) to select some period of 
time as 100 per cent with which to compare the other index numbers. A 
month is unquestionably too short a period to use as base period, since 
any one month is likely to be unusual on account of accidental or seasonal 
influences. A year, however, is often used. Before the World War 1913 
was the base of the United States Bureau of Labor Statistics index of 
wholesale prices; but more recently it has been shifted to 1926. At the 
time the shift was made, it was the consensus of opinion that the post-war 
prices would stabilize themselves at about the 1926 level. Consequently 
this year was looked upon as a good year to use as a basis for comparison. 
However, it is probable that no one year is sufficiently ^ffiormaF^ to be a 
good basis of comparison. Business and prices are always advancing or 
receding with the business cycle. Though not so specific, an average of 
several years is a better base. The period 1910 through 1914 has some- 
times beerj^used as a price base, while the 1923-1925 average is often used 
for quantity indexes. A useful solution is to employ the period of years 
that is used by some of the other indexes with which the one being con- 
structed is likely to be employed.* 

Although a particular base may be satisfactory for a number of years, 
that base becomes less meaningful as time passes, and it eventually becomes 
desirable to shift to a more recent period. The reasons are: (1) the dis- 
persion of price relatives becomes so great that no average is reliable; 
(2) the pattern of consumption changes to such an extent that no aggregate 
of commodities can be found which includes the major expenditures com- 
mon to both periods; (3) the quality of many conunodities, nominally the 
same, progressively changes with time. An indirect basis of comparison 
may be had by utilizing a chain index system. This method, which is not 
completely satisfactory, will be explained in the following chapter. 


® It should be noted that the sample used for an index number is generally a stratified 
sample, and that the items from each stratum are not drawn at random. Consequently, 
ordinary reliability formulae are not applicable. 

The statistical agencies of the United States government are now comput in g several 
mdexes on a 1935-1939 base. s 



TABLE 138 

Construction op Simple Aggregative Index Numbers op Building Material Prices, 1926-193'/ 

(Prices and values in dollars) 



587 



136. Figures for VnQo values are Quantities multiplied by prices of Table 138 
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Aggregative Price Index Numbers 

It has already been stated that there are two methods of constructing 
index numbers; (1) by computing aggregate values; (2) by averaging rela- 
tives. By the first method, as will be explained in this section, the prices 
or quantities are made comparable, are automatically weighted by being 
reduced to dollar values, and then are combined into aggregate values. 
In the following section the method of averaging relatives will be explained. 
There it will be shown that usually the two methods are merely alternative 
methods of obtaining the same result. The aggregative method obtains 
the result directly, and produces a result that has a simple and clear mean- 
ing; the other method is more roundabout, and its meaning is more tech- 
nical. Nevertheless, there are situations in which the aggregative method 
is not applicable, and recourse must then be had to the averaging of rela- 
tives. 

Simple aggregates. Table 138 illustrates the construction of a simple 
aggregative price index. The prices of each commodity in any given year 
are merely added together to give the index number for that year. It is 
then frequently convenient to designate some year as a base, which is set 
equal to 100. In this illustration all of the index numbers are expressed 
in the final column as a percentage of the 1926 number, found by dividing 
each one of the numbers by the value in the base period ($88,811) and 
multiplying by 100. 

In Chart 211 A are shown the seven components of this index and, at 
the top, the index itself. The vertical distance between the different 
curves gives some idea of their relative importance.^ This chart show’s 
in striking fashion why the index follows rather closely that of hard maple. 
That commodity was priced in 1926 four times as high per unit as the 
product with the next highest unit price, building brick, and comprised 
$55 673 

$ 88 ~ 8ii " aggregate. This is far in excess of 

its actual importance as a commodity used in building. It is apparent 
that the influence which a commodity exerts on a simple aggregative index 
depends on the price per unit of quotation. In this instance, hard maple 
was the predominant item; if house paint were quoted at wholesale by the 
barrel instead of by the gallon, that commodity would largely have deter- 
mined the course of the index. The weighting of an aggregative index by 
one commercial unit of each commodity represented, then, is illogical in 
that it neglects to consider the actual importance of the different com- 


This chart was drawn on ratio paper in order that the proportionate changes of 
the different series might be compared with each other. Actually, the chart fails to 
give full effect to the relative influence of the higher priced series, since the vertical 
space between the curves is in proportion to the logarithms of the prices rather than 
the actual prices- 



Ymousanos 
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Chart 211 A. Unit Prices of Seven Chart 21 IB. Values of Base Year 

Building Materials and Simple Ag- Quantities of Seven Building Mate- 
gregative Index Numbers, 1926-1937. rials and Aggregative index Numbers 
(Data of Table 138.) with Base Year Quantity Weights, 

1926-1937. (Data of Table 139 ) 
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modities; it is haphazard in that the relative influence of the different 
commodities is determined by factors quite irrelevant to the purpose of 
ohe price index. The problem would in no sense be solved if all com- 
modities were reduced to a price per pound, for some commodities, such 
as diamonds, are very costly per pound and yet are not very important 
in our economic life, while coal, which is of tremendous importance, is 
relatively cheap per pound. Furthermore, some goods, such as electric 
power or human labor, cannot be reduced to a pound basis. Still another 
solution is to take as the unit of quotation the amount that can be pun 
chased for one dollar in the base year. But this is scarcely more logical 
since it would be very unusual if the same amount of money were spent 
on each commodity in every year. 

Before consideration of the construction of weighted aggregative index 
numbers, it may be helpful to state symbolically the method we have just 
used. The formula is 



where P means price index. 

p refers to price of an individual commodity. 

0 refers to the base period, from which price changes are measured. 
n refers to the given period, the year being compared with the base. 
Now if the formula for a particular year (say 1931) is to be stated, it could 
be written 


Pzi - 




These are the notations used in Table 138. 

Weighted aggregates. In order to allow each commodity to have a 
reasonable influence on the index, it is advisable to use a weighted rather 
than a simple (unweighted) aggregate of prices. To construct a weighted 
aggregative index, a list of definite quantities of specified commodities is 
taken, and calculations are made to determine what this aggregate of goods 
is worth each year at current prices. Obviously the process is merely that 
of multiplying each unit price by the number of units and summing the 
resulting values for each period. The procedure, using the quantities 
marketed in 1926 as multipliers, is illustrated in Table 139. The reader, 
having followed the reasoning to this point, will realize now that aggregative 
index numbers of price measure the changing value of a fixed aggregate of 
goods. Since the total cost or value changes while the components of the 
aggregate do not, these changes must be due to price changes. It appears 
that this type of index number measures the very thing sought if we wish 
to determine changes in the cost of living. The Philadelphia Rapid 
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Transit Company, therefore, chose this type as a guide to its wage policy. 
The general formula for the aggregative price index is 

P 

Spog 

The symbols are those used earlier, but a new one has been added : g refers 
to the quantity of the commodity produced, marketed, or consumed (that 
is, the quantity weight, or multiplier). Since the index numbers con- 
structed in Table 139 were weighted by base year quantities, we may 
write the formula more specifically 

p == ^Vnqo 
'SpaQo 

If the reader will turn to Chart 21 IB, he will be struck by the wide 
variation between the simple and the weighted index numbers. Further- 
more, the reason for the difference will at once be apparent. In the simple 
aggregative type, hard maple, the commodity that declined most greatly, 
is of dominating importance; whereas, when weights are introduced, cement 
becomes most important, although no longer does any single commodity 
exercise overwhelming influence on the course of the index Nevertheless, 
it is significant that the two most important commodities, cement and 
structural steel, are the two commodities that have most nearly regained 
their 1926 level. 

Selection of weights. Although in the preceding illustration the quan- 
tities marketed in 1926 were used as weights, this simple procedure is but 
one of several possible systems. It would have been just as easy to have 
taken, say, 1927 quantities as weights. If the quantity of each com- 
modity marketed changed from year to year in the same proportion, it 
will make no difference to what period the weights refer, for the results 
will be identical. In fact, however, the relative importance of the different 
commodities is constantly changing, and this is due in part to the change 
in the relative prices of the different commodities. Therein lies a great 
source of difficulty for winch there is no completely satisfactory solution. 
The answer depends in part on what the analyst thinks a price index is 
supposed to do. 

One view is that such an index number measures the changing cost of a 
constant aggregate of goods. Another view concerns itself not with the 
goods level of analysis but with the satisfactions level; an index number 
should measure the changing cost of aggregates of goods yielding the same 
utility or satisfaction at two periods, or two places. Thus, suppose we 
compare the cost of living of two groups of similar persons at two periods 
(or places), these groups having at the two periods (or places) the same 
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tastes and capacity for enjoyment, as well as an income that will purchase, 
and does purchase, the same amount of satisfaction.® The commodities, 
of course, will be different, but if the expenditures were $2,000 the first 
year and $2,400 the second year, we may conclude that the cost of living 
has gone up 20 per cent. It goes without saying that no one has accu- 
rately made a measurement of this kind. Although it seems feasible to 
measure only the varying value of a fixed aggregate of goods, yet the 
analyst should select a list of goods that will avoid the certainty of bias 
in a known direction with respect to the cost of obtaining equal satisfac- 
tions at different times. The following suggestions have been made for 
solving this knotty problem. 

1. Use base period quantities as weights. This is the method we have 
used for illustrative purposes in Table 139. However, even if there has 
been no change in the tastes or environment of purchasers between the 
two periods, purchases of those commodities that have increased relatively^ 
in price will decline relatively, and purchases of commodities that have 
decreased relatively in price wiU increase relatively. It is entirely pos- 
sible that this type of index might record an increase in the price level, 
whereas by increasing the relative amounts purchased of commodities that 
decline in price, the same amount of satisfaction might actually be bought 
by a given individual at a lower total cost. This type of index, then, 
has in a sense an upward bias. It might be said that this index marks 
an upper limit to the price change. This method is sometimes known as 
Laspeyres^ method and, as previously stated, can be defined symbolically, 

p ^ ^Vnqo 

'^poqo 

2. Use given period quantities. That is, use the weights that pertain 
to the year the price level of which is to be compared with that of the 
base period. This method mvolves the selection of a new set of weights 
each year, or even more often. But frequently it is impossible to obtain 
current quantity weights, and, even if they are available, the labor of 
computation is approximately doubled. Furthermore, although each pe- 
riod is thereby directly comparable with the base year, the comparison of 
the different years among themselves is not valid, for the reason that the 
aggregate of goods differs each year. 

If we think of 1926 as being the base period, the base year weighting 
system answers the question: If it cost me $100 a month to live in 1926, 
how much would it cost me this year to live the way I did that year? The 
given year weighting system answers a different question: If I could have 

® See J. M. Keynes, A Treatise on Money, Vol. I, pp. 96-99. Harcourt, Brace* & 
C5o., New York, 1930. 
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supported my 'present scale of living in 1926 with $100 per month, how 
much must I spend this year? A theoretical objection to asking such a 
question is that undue weight is given to the commodities that have de- 
clined in price. It is the relative decline in price that may be responsible 
for their increased purchase, and, although it is price change which we 
are trying to measure, yet our weighting is partly determined by relative 
price changes. Thus this method may be said to have a downward bias, 
and marks the lower limit of price change. It is sometimes known as 
Paasche^s method and has the following formula: 

P = 

^PoQn 

3. Use the average {or total) quantities of base ana given years. This is 
a compromise solution, although it is one which has no general bias in 
any known direction. But again, as in method 2, we have shifting weights 
and a resulting lack of comparability among the different years. The 
method was proposed independently by the English economists Marshall 
and Edgeworth, and the formula 

JP = Q'n) 

2po(go + qn) 

is sometimes called the Marshall-Edgeworth formula. 

4. Average together the quantities for all the years which the index numbers 
include. Though perhaps an excellent solution for a historical study, this 
plan is impracticable if the index is to be kept up to date, since it means 
current revision of weights and continuous recomputation of the complete 
set of index numbers. 

5. Average together the quantities of several years which are thought to he 
typical This again is a compromise solution, but it is practical and is 
very frequently adopted. The list of quantities used will, however, even- 
tually become obsolete. When that is the case, a new index can be con- 
structed and spliced to the old one. Methods for so doing will be con- 
sidered in the following chapter. The construction of an index number 
of 1931 building material prices, using as weights the average quantity 
marketed in 1927 and 1929, is illustrated in Table 140. The index number 
varies only one-tenth of a point from that employing base year weights. 
The formula for this particular index number may be written 

p = ^Pnq27,29 
2 ^ 0 ^ 27, 29 

6. Determine the highest common factor. The weights are the quantities 
of each commodity common to each year, either to the base and given 
year, or to all the years under comparison. In the latter case this would 
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mean that, for any commodity, the smallest amount marketed in any of 
the years under comparison would be taken. Usually, then, the quantities 
of the different commodities taken would not each be for the same year. 
This ingenious device has been suggested by J. M. Keynes^ to avoid the 
sort of bias inherent in methods 1 and 2, already described. Its virtue 
is its modesty: the device avoids trying that which cannot be done per- 
fectly, However, if the values of quantities that are common to the dif- 

TABLE 140 


CONSTEUCTION OF 1931 WEIGHTED AgGEEQATIVE InDEX NxJMBEE OF BuiLDING MATE- 
RIAL Prices, Using 1927 and 1929 Average Quantity Weights 

(Quantities and values in thousands) 



Average 



1927, 

1929 


quantity 

marketed 

1926 

1931 

quantities at: 

Commodity 

in 1927 
and 

price 
per unit 

price 
per unit 

1926 

1931 


1929 

P26 

Vzi 

prices 

prices 


$27,29 



P26$27,20 

P31$27,29 

Common bmlding brick 

6,348 

$13,913 

$12,396 

$ 88,320 

$ 78,690 

Portland cement 

171,926 

1744 

1385 

299,839 

238,118 

Hard maple, No 1 

794 

55.673 

37 802 

44,204 

30,016 

Outside white gloss house 



j 



paint 

49,082 

2 208 

1930 

108,373 

94,728 

Lavatories 

1,590 

12 374 

10 506 

19,675 

16,705 

Structural steel 

90,970 

1.958 

1627 

178,119 

148,008 

Building gravel 

80,666 

.941 

836 1 

75,907 

67,437 

Aggregate value . ... 
Index number 


1 


$814,437 
100 0 

$673,701 

82 7 


Source: See Table 136 


ferent periods are small compared with total expenditures, or if they con- 
stitute in different periods a varying per cent of the total, or if the satis- 
faction derived from this aggregate of goods varies, the method is no more 
accurate and, quite likely, is less accurate than method 5, 

7. Make two index numbers^ each with a different set of weights, and 
average the two together, usually geometrically. The two systems of weight- 
ing chosen are ordinarily base and given year weights. The formula then 
becomes 

jP = y^ ^Vngn 

1 ^Voqn 

It is frequently called Fisher^s “ideaF’ index number, because it conforms 
to certain tests of consistent behavior which Irving Fisher considers ap- 
® Ibid., pp. 105-109. 
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propriate.^^ On the other hand, it is difficult to say precisely just what 
such an index number does measure. 

A general criticism of any weighting system which involves the use of a 
diflferent set of weights for each index number is that, although each index 
number may validly be compared with that of the base year, logically the 
index numbers of no other two years (such as 1936 and 1937) can be com- 
pared with each other. This criticism applies to given year weights, to 
the average of base and given year weights, to the highest common factor 
method when the quantities selected are common only to the two years 
being compared, and to the ^^ideal” index number. It does not apply to 
base year weights, average weights of all years, typical weights, or the 
highest common factor method when the quantities common to all years 
are used. 

Although the theory of weight selection is interesting and involves logical 
analysis of a high order, it is easy to overestimate its practical importance. 
Consider the following results obtained from the building material data; 


1931 

System of weighting index number 

Simple 74: 9 

1926 quantity weights (base year weights) 82 6 

1927 and 1929 average quantity weights 82 7 

1931 quantity weights (given year weights) . ... 82.5 

Ideal’’ index number 82.6 


In this case there is a very great difference between the simple and the 
weighted index numbers, but practically no difference between the systems 
of weighting. If, however, both the prices and quantities had varied 
greatly in their relative magnitude, the different weightings might have 
given markedly different results. Furthermore, it is usually of slight im- 
portance whether exact weights are used, or only approximate weights. 
Thus, Table 141 is exactly like Table 140 except that the quantity weights 
are rounded to one digit, but the results vary by only two-tenths of a 
point. For all practical purposes, sufficiently accurate results will usually 
be obtained if exact weights are given to the few more important commod- 
ities, and rounded weights to the numerous unimportant conamodities.^^ 
Although only approximate accuracy is necessary in choosing weights, 
accuracy in price quotations is, in practice, of much greater importance. 


See Irving Fisher, The Mak%ng of Index Numbers^ p. 220, Houghton Mifflin Com- 
pany, Boston, 1927. In Chapter IV Professor Fisher discusses these tests. 

Irving Fisher recommends that the quantities be rounded to 1, 10, lOO, or 1,000- 
This, of course, materially lightens the work. In rounding any quantity between 1 
and 10 (for instance), the dividing point is not the arithmetic mean of these two num- 
bers, but the geometric mean, 3.1623, since this involves the smallest relative error 
See ibid., pp. 346 and 432. 
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If all prices moved in the same direction and at the same rate, it would 
make no difference what system of weighting were chosen. We have 
found that, in fact, distributions of price relatives do display a central 
tendency. But if it so happens that commodities which are changing 
greatly in relative importance during the period are also undergoing price 
changes materially different from the average, then the matter of weighting 
becomes important. 

Over a number of years various changes take place : commodities shift 
considerably in their relative importance; old commodities disappear from 

TABLE 141 

Construction of 1931 Aggregative Index Number of Building Material Prices, 
Weighted by 1927 and 1929 Average Quantities Rounded to One Digit 

(Quantities and values m thousands) 



Average 
quantity 
marketed , 
in 1927 
and 1929 

§27, 29 

1926 

1931 

1927, 1929 
quantities at; 

Commodity 

price 
per unit 
P26 

price 
per unit 

P31 

1926 

prices 

P26?27,29 

1931 

prices 

P31^27,29 

Common building brick 

6,000 

$13 913 

$12 396 

$ 83,478 

$ 74,376 

Portland cement . 

200,000 

1 744 

1385 

348,800 

277,000 

Hard maple, No 1 . . 
Outside white gloss house 

800 

55 673 

37 802 

44,538 

30,242 

paint 

50,000 

2 208 

1930 

110,400 

96,500 

Lavatories 

2,000 

12 374 

10 506 

24,748 

21,012 

Structural steel ... . 

90,000 

1958 

1627 

176,220 

146,430 

Building gravel 

80,000 

941 

836 

75,280 

66,880 

Aggregate value . . 
Index number . ... 




$863,464 
100 0 

$712,440 

82 5 


Source: Table 140 


use and are succeeded by new commodities; models, styles, or grades of a 
commodity become obsolete and cease to be manufactured, with new 
models, styles, or grades taking their place; marketing centers shift, so 
that a price quotation at the new center must replace that at the old; 
f.o.b. price quotations may give way to delivered prices. Under any of 
these circumsxances it may be desirable to express each index number, not 
as a percentage of the original base, but as a percentage of the preceding 
period. Such a link relative index number might employ any of the 
formulae given above, utilizing weights pertaining to either or both of the 
years or months being compared. Frequently these separate percentages 
(link relative index numbers) are chained back to the original base by a 
process of successive multiplication. Such an index, known as a chain 
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index, will be further described in the following chapter. Overlapping 
price data are needed for only a single period, as a direct comparison i^ 
made only between the prices of the current period and those of the pre- 
ceding period. 

Averages of Price Relatives 

A brief illustration will indicate the method of obtaining index numbers 
by averaging price relatives. 

1. Reduce the actual prices to a percentage of the base period. The per- 

CENT 



ARITHMETIC MEAN OF SEVEN SERIES 

COMMON BUILDING BRICK 

PORTLAND CEMENT 

HARD MAPLE, NO 1 

OUTSIDE, WHITE, GLOSS, HOUSE PAINT 

LAVATORIES 

STRUCTURAL STEEL 

BUILDING GRAVEL 

Chart 212. Price Relatives and Simple Arithmetic Average Index TTumbers of Seven 
Building Materials, 1926-1937. (Data of Table 142.) 

centages are called price relatives, since they are expressed not as dollars 
and cents but as percentages relative to the price during a certain period. 
Table 142 shows the price relatives for our seven building materials from 
1926 through 1937. Each of these series of relatives was computed in the 
same manner as were the relatives for common building brick in Table 136. 

2. Average the price relatives for each year separately, thus obtaining a 
series of index numbers. In Table 142 a simple arithmetic mean is used. 
Chart 212 shows the movement of the individual price relatives, together 
with the average movement. It is, of course, possible to use other types 




TABLE 142 

CoNSTRxrcTioN OP Inbbx Numbers op Building Material Prices by Simple Arithmetic Means op Price Relatives, 1926-1937 




Source; See Table 136, Figures for 1927-1937 are derived from P 2 QQ 26 column and relatives of Table 142 
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of averages, such as the harmonic mean, the geometric mean, or the me- 
dian. If a weighted average is used, the weights are value weights, as 
contrasted with the aggregative method, which makes use of quantity 
weights. 

T3rpe of average. A fairly good theoretical argument can be built up 
for the use of the geometric mean in averaging price relatives. Let us 
assume the simple case of measuring the difference in price level between 
two countries, in which two commodities only are used. 



1 

Country A 

I 

Country B 

Commodity 

Unit price 

Price relative 
(per cent) 

Unit price 

Price relative 
(per cent) 

Wheat 

$0.80 

100 

$160 

200 

Cotton . . 

12 

100 

06 

50 

Average 


100 


125 


According to this method of calculation, the price level is 25 per cent 
higher in Country B than in Country A. But the reader can easily verify 
that, if Country B had been taken as the base and the price in Country A 
calculated relative to Country B, the price level in Country A would have 
appeared 25 per cent higher than in Country B. An unweighted arith- 
metic mean is therefore sometimes said to have an upward bias. On the 
other hand, the geometric mean of 2.00 and .50 is 1 ; hence the results are 
consistent no matter which country is considered the base. 

This paradox is due to a concealed change in the weighting system. 
Actually there is no such thing as an unweighted index; the weights are 
there, be they appropriate or otherwise. Now in the table above the 
weight may be thought of as $1.00 for each commodity consumed, and 
which represents, for Country A, the base: 

li bushels of wheat @ $0.80 = $1.00. 
pounds of cotton @ .12 = 1.00. 

Keeping the same quantity element in our weights, we may compute new 
weights to be used when Country B is taken as 100 : 

IJ bushels of wheat @ $1.60 = $2,00. 

S-l pounds of cotton @ .06 ^ .50. 

Let us now compute a weighted index number with Country B =» 100. 
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Commodity i 

Value 

Price relative 

^ 1 

Weighted relative 

weight 

B 

A 

B 

A 

Wheat 

$2 0C 

100 

50 

200 

100 

Cotton 

.60 

100 

200 

50 

100 

Total 

$2.50 



250 

200 

Index number. . , j 



100 ' 

1 

80 


The results are now seen to be consistent. Retaining the quantity ele- 
ments in the weights constant in this manner, the price level in Country B is 
125 per cent of A, and in Country A it is 80 per cent of B. The so-called 
bias of the arithmetic mean turns out to be a matter of improper weighting. 
The arithmetic mean argues that, if we purchase the same commodities in 
the same relative quantities in the two coimtnes, the index number is 
25 per cent higher in Country B than in Country A; the geometric mean, 
to yield consistent results, requires that the value of the different com- 
modities purchased be in the same ratio in the two countries (thus necessi- 
tating that in Country B a relatively smaller quantity of wheat be pur- 
chased and a relatively larger quantity of cotton). 

A closely related argument for the geometric mean which is sometimes 
advanced is based upon the assertion that frequency distributions of price 
relatives tend to form a normal distribution when plotted on paper having 
a logarithmic X scale (or when the logarithms of the price relatives are 
plotted on arithmetic paper). The reasoning runs as follows: the doubling 
of a price represents as important a divergence (and is as likely to occur) 
as a decline to one-half of its former level; it is as lilcely to increase to f of 
the base period as to fall to f of the base period; it is as likely to xise 
to injSnity as it is to fall to zero. The resulting frequency distribution 
therefore tends to be normal geometrically, and the geometric mean, 
which coincides with the mode of such a distribution, is the appropri- 
ate average. This argument is logical but is based upon premises that 
are not fully established. We are not' sure that a price is as likely to 
double as to drop one-half. It is frequently easier to cut off buyers by 
doubling prices than to attract more purchases by cutting prices one-half. 
In the early part of this chapter the deciles of a number of price distribu- 
tions were shown. In nearly every year the distribution was skewed nega- 
tively. It is noteworthy also that in every year the price level was below 
that of 1926. In fact, there seemed to be a rough tendency for these dis- 
tributions of relatives to be skewed in the direction of the change from 
the base. This is not surprising in view of the much publicized tendency 



Chap. 20] 


INDEX NUMBEE CONSTRUCTION 


601 


for many prices to be rigid. The change in the price level is perhaps ac- 
counted for in large part by the price movements of the flexible prices. 
If it be true that rigid prices resist downward changes more persistently 
than they do upward, we might expect logarithms of price relatives to be 
skewed negatively on the average. At any rate, most distributions are 
not normal geometrically, regardless of what may be true on the average. 

It should not be thought that the geometric mean must never be used ; 
it merely is to be doubted that it has any inherent general superiority 
over the arithmetic mean. It is the belief of the authors that the average 
to use is determined in large part by the use for which the index numbers 
are intended. If, as is very often the case, we wish to compare the amount 
of money required at two different times or in two different places to pur- 
chase the same commodities (or perhaps the same amount of satisfaction 
by like individuals, with tastes and environment held constant), the 
weighted arithmetic mean should be used. This is because (as will be 
shown) such an index number may also be regarded as a weighted aggrega- 
tive index number. On the other hand, if the primary object is the study 
of price relatives, including their average behavior, the geometric mean 
may be useful. 

The mode is seldom advocated. If the mode of price relatives of paint 
and paint materials had been calculated, it would have remained at 100 
from 1926 perhaps through 1930! But after a number of years have 
elapsed, and when an index covering a broad field is sought, it is likely 
that the central tendency will not be sufficiently marked to justify use of 
the mode. The median is seldom used either, but might be appropriate 
if the accuracy or representative character of some of the data is in doubt. 
The harmonic mean has been suggested by Ferger (see footnote 2, Chapter 
XXI) if it is desired to use the reciprocal of the price index as an index 
of the purchasing power of money. 

Weighting systems. In the illustration in Table 143, the values mark- 
eted in the base year (1926) are used as the weights. Like any weighted 
average, this one is obtained by: first, multiplying the relatives by their 
weights; second, summing these figures year by year; and finally, dividing 
these totals for each year by the sum of the weights. The results are the 
same as those obtained for the aggregative index with base year quantity 
weights. The reason is obvious. Take a single commodity, building 
gravel: 

Value of 57,796,000 tons @ $0,941 (1926 price) = $54,386,000; 

Value of 57,796,000 tons @ $0,911 (1927 price) - $52,652,000. 

Price relative ($.911 $.941) = .96812, or 96.812 per cent; 

$54,386,000 X .96812 (the value in the base year) =^$52,652,000. 
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(Table 143 shows $52,646,000 instead of $52,652,000 for 1927 because the 
1927 relative was taken as 96.8.) 

This relationship is true, not only for each individual commodity, but 
for the aggregate values. In symbols: 



PoQo 


Vnqo 

PoqJ 


S ^ 

Vo 

^Voqo 


'^Vnqo 

^Voqo 


Evidently the method of weighted average of relatives is usually a 
roundabout method of doing what may more easily be accomplished by 
direct means using aggregates. Furthermore, the meaning of an aggre- 
gative index seems clearer to most persons than does an average of rela- 
tives. Why, then, should not the aggregative method always be used? 
One reason is that the price relatives themselves are occasionally worth 


More generally, the following relationships may be stated with regard to price index 
numbers: 

(1) An arithmetic average of relatives weighted by base year values (^oQo) is the 
equivalent of an aggregative index weighted with base year quantities. 

(2) Similarly, an arithmetic average of relatives weighted by the product of base 
year prices and given year quantities (poQn) is the equivalent of an aggregative index 
weighted with given year quantities. 

(3) A harmonic average of relatives weighted by given year values (pnqn) is the 
equivalent of an aggregative index weighted with given year quantities. Thus 



(4) Similarly it may be shown that a harmonic average of relatives weighted by the 
product of base year quantities and given year prices (pnQo) is the equivalent of an 
aggregative index weighted with base year quantities. 

These generalizations may be stated in the form of guides to the construction of index 
numbers, when the index numbers are to be constructed from relatives: 

(a) If it is desired to use the arithmetic average of relatives, the value weights should 
be the products of the base prices and whatever quantities are desired. 

(b) If it is desired to use an average of relatives emplo5ring value weights that are 
the product of given year prices and quantities of some period, the harmomc average 
should be used. 

Under no circumstances should the arithmetic average of relatives be used with values 
involving given year prices, since this gives extra weight to a commodity merely because 
it has gone up in price. Such a procedure results in an upward bias. 
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studying, not only because an individual series may hold special signifi- 
cance for the reader, but because a study of groups of relatives may assist 
in selecting a sample or determining what group indexes to make. In 
connection with frequency distributions it was observed that an average 
never gives a complete picture of any situation. Other measures may be 
worth making. Another reason is that the series to be combined can 
sometimes be obtained only in the form of relatives; for instance, Snyder’s 
Index of the General Price Level (published by the Federal Reserve Bank 
of New York) is a weighted arithmetic average of a number of component 
price indexes. The component indexes are: retail food prices; rents; other 
cost-of-living items; prices of industrial commodities at wholesale; farm 
prices at the farm; transportation costs; realty values; security prices; 
equipment and machinery prices; hardware prices; automobile prices; 
composite wages. The use of relatives is more common, however, in con- 
structing various types of quantity indexes, since the components of these 
indexes are often themselves index numbers or other types of relatives. 

Commodity weights versus group weights. The same practical advice 
may be offered concerning value weights that was given concerning quan- 
tity weights — only approximate accuracy is necessary. Nevertheless, the 
following consideration becomes important when only a limited number of 
commodities is chosen: Should the value weight selected for any given 
commodity be the value of that commodity entering the market, or should 
it refer to the whole group of commodities which the commodity represents? 
This is likely to be a far more important consideration, over relatively short 
periods of time, than the question of the period to which the weights refer. 
The answer to this question is that, unless it is practicable to increase the 
number of items in some groups (and perhaps decrease the number in 
others) sufficiently to obtain proportionate value representation for the 
different groups, it is decidedly better to adjust the weights of the different 
items so as to obtain such group representation. 

In Table 144 we have the application of the estimated total value of the 
different subgroups marketed in 1926 to the individual commodity price 
relatives of Table 142.^^ The primary effect is to increase the weight of 
hard maple, the price of which declined greatly during the depression. 
Of considerable importance is the reduced weight of cement and structural 
steel, the prices of which had by 1936 approximated their 1926 levels. 
Gravel is increased in importance. Although this commodity did not 
decline in price greatly during the depression, its general trend through- 
out the period has been downward* The net result has been that the 

If an aggregative index is used, weighting quantities in accordance with the im- 
portance of their group necessitates the use of derived quantity weights expressed ir 
abstract units. This is done, as shown in the following table, by dividing the market 
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index with subgroup weights declined more during the depression than 
did that with commodity weights, and has regained less of its loss since 
1932. It should not be concluded from the preceding illustration that 
paucity of price quotations is desirable. Most satisfactory results will 
be obtained if we select as large a number of commodities from each 
group as feasible, and at the same time give additional weight to those 
elements that are under-represented. 

Another method of accomplishing the same result is to select as many 
commodities as convenient for each group, to compute separate group in- 
dexes, and then to combine the group indexes into a general index, using 
the appropriate weights. Since the group indexes are relatives, their com- 
bination presents no new problem. 

It might further be noticed that weighting of commodities may in a 
sense be regarded as a substitute for selecting the number of commodities 
from the different groups in proportion to the value of those groups. For 
instance, we might select commodities as follows from the different sub- 
groups : 

Brick and tile 2 items 

Cement 1 item 

Lumber 7 items 

Pamt and paint materials . . 3 items 

Plumbing and heating ... 1 item 

Structural steel 1 item 

Other building materials .... 7 items 

All buildiag materials ... 22 items 


value of the subgroup by the price per unit of the commodity representing the subgroup. 
The derived quantities so calculated are applied in the usual fashion, in the same manner 
as in Table 139 

Computation of DEEivEn Quantity Weights op Seven iNDrvmuAn Commodities 

TO CoEEESPOND WITH THE IMPORTANCE OP THEIR SUBGROUPS, BUILDING 

Materials, 1926 


Commodity subgroup 

(1) 

Value 
marketed 
(thousands 
of dollars) 

(2) 

Umt price 
of subgroup 
represent'itive 
(dollars) 

(3) 

Derived 

quantity 

marketed 

(thousands) 

[Col 2 ^ Col. 31 
(4) 

Brick and the 

380,031 

13.913 

27,315 

Cement 

260,803 

1.744 

149,543 

Lumber 

1,358,705 

1 55.673 

24,405 

Paint and paint materials . . 

i 634,869 

2.208 

287,531 

Plumbing and heating . . 

281,213 

12 374 

22,726 

Structural steel 

148,868 

1.958 

76,031 

Other building materials . . . . 

1,390,395 

.941 

1,477,572 


Sourtse: Tables 137 and i; 
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If the most important items in each group are selected, an unweighted 
average of price relatives will 3 deld reasonably good results. For instance, 
such an index for 1931 was found by the writers to be 80.8 per cent, which 
is not greatly different from the figure (81.3) obtained by the use of sub- 
group value weights. 

BeforeTeaving this section, it is interesting to compare the results of the 
major illustrations developed thus far with each other and with the 
United States Bureau of Labor Statistics index of building material prices. 

PER CENT 



U S BUREAU OF UBOR STATISTICS INDEX 

AGGREGATIVE WITH BASE YEAR WEIGHTS 

SIMPLE ARITHMETIC AVERAGE OF REUTIVES 

AVERAGES WEIGHTED BY BASE YEAR SUBGROUP VALUES 

SIMPLE AGGREGATIVE 

Chart 213. Index Numbers of Buildmg Materials as Obtained by Different Methods, 
1926-1937. (Data of Tables 138, 139, 142, and 144. United States Bureau of Labor 
Statistics Index Numbers were obtained from the Bureau’s bulletin. Wholesale Prices^ 
December and Year 1937, p. 9.) 

The latter includes the seven items here used and a great many additional 
ones. Although the Bureau’s weighting system is one which employs 
shifting weights, it affords a standard of comparison for index numbers 
constructed from the sample. It is easily apparent from inspection of 
Chart 213 that the weighting systems in the present instance rank in order 
of closeness of conformity as follows: 

(1) Base year subgroup value weights (Table 144). 

(2) Simple arithmetic average of relatives (Table 142). 

(3) Base year commodity weights (Table 139 or 143). 

(4) Simple aggregative (Table 138). 

Although the results of this experiment are not to be taken as conclusive, 
they do point to the desirability of giving each important element in the 
system approximately its correct weight, and they indicate the unfortunate 
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Skyutce. See Table 136. Tiguree for 1927—1937 are <ierived. from PqQq column and relatives of Table 142 
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results sometimes attending the use of haphazard weighting. It may seem 
surprising that the simple arithmetic average obtains, in this case, better 
results than the index with base year weights. Ordinarily this would not 
be the case if a reasonably large sample were selected. But in the present 
instance, using only one commodity for each subgroup, it so happened 
that giving equal weight to each commodity gave more accurate represen- 
tation to each subgroup than did weighting each commodity in accordance 
with its own importance. Generally speaking, indexes involving the sim- 
ple arithmetic average of relatives or the simple aggregates would be the 
least desirable. 

Quantity Index Numbers 

Aggregative t3rpe. An aggregative index number of quantity (physical 
volume) is the counterpart of the corresponding price index. The general 
quantity formula is 

Q~^- 

Taking the building materials data, the construction of simple aggregative 
quantity index numbers is illustrated by Table 145. With 1926 prices as 
weights, an index of the quantity of building materials marketed may be 
constructed as in Table 146. This index number is not an estimate of the 
physical volume of building construction, nor is the corresponding price 
index an index of the cost of building construction. A considerable amount 
of construction material is purchased for the purpose of maintenance and 
repairs rather than new construction; furthermore, no account has been 
taken of labor cost. Although our price index parallels fairly closely the 
official index for the price of building materials, we have no basis of com- 
parison for our quantity indexes. Nor can great accuracy be claimed for 
them; in absence of complete data much of the quantity material from 
which the indexes were constructed was obtained by very rough estimates. 

Just as the aggregative index number of price measures the changing 
value of a fixed aggregate of goods, so the aggregative index number of 
physical volume measures the changing value of a varying aggregate of 
goods at fixed prices. The price index answers the question: If we buy 
the same assortment of goods each year, but at different prices, how much 
will we spend each year? The physical volume index answers the ques- 
tion: If we buy varying quantities of specified goods each year, but at the 
same price, how much will we spend each year? While in the former case 
the difference in amount spent was due to price change, in the latter case 
the difference must, of course, be attributed to changes in quantities bought 
and sold, since prices were held constant. 

A number of different methods of weighting are available for the con- 
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Source. Tables 138 and 145 
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^traction of quantity index numbers, and in general the same considera- 
'dons apply that were discussed in connection with price index numbers. 
In obtaining price weights which are averages of two or more years, the 
average prices should be weighted average prices, obtained by dividing 
the total value sold in these years by the total number of units in those 
^ame years. Thus, if average quantities of base and given years are used, 
^e have the rather formidable looking formula 


Q 


S^n 

(VoQo + Vn<lr\ 

9. + gn / 


(VoQo + Pngq 

g. + g™ / 


n I _ \go + (jnj 

n / \Qo "b ^nj 


Likewise, if the common factor method is used, the price weight should 
be derived from the largest value that is common to all the years in 
question. 

Averages of relatives. This method of constructing quantity index 
numbers is strictly analogous to the method applied to the measuring of 
price changes. The procedure is illustrated by Tables 147 and 148. As 
was found to be true with price index numbers, the use of base year value 
weights produces the same result as the aggregative method employing 
base year quantity weights. 

Although, whenever it is applicable, the aggregative method is to be 
preferred to the average of relatives method on account of ease of compu- 
tation and simplicity of meaning, there are circumstances when the aggre- 
gative method cannot be used. When the relatives which are to be aver- 
aged are percentages, not of a fixed base but of a changing normal, the 
average of relatives method is necessary. In other words the aggregative 
method cannot be used if an index of business cycles is to be constructed, 
since the data to be averaged are percentages of trend and seasonal. The 
Federal Reserve Bank of New York Monthly Index of Production and 
Trade, described in the next chapter, is an illustration of this point. 

Usually the weights selected for an average of quantity relatives are in 
proportion to the values in exchange of the different series. Occasionally, 
some consideration is given also to the relative amplitude of the different 
series, if they are cyclical relatives. Several illustrations of this technique 
will likewise be given in Chapter XXI. If an index is constructed, not 
for the purpose of measuring changes but for the purpose of jorecasting 
changes, the basis of selecting will be not the economic importance of the 
different series represented, but their importance for purposes of forecast- 
ing. See description of the Bradford B. Smith Forecasting Index on pp. 
817-820. 

Chapter XXI will describe methods of constructing a number of im- 



TABLE 147 




SoTirce* Vo9.o values are from Table 143 Figures for 1927-1937 are derived from column and relatives of Table 147 
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port ant indexes, ancL will discuss various points of technique and theory 

not covered in this chapter. 
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CHAPTER XXI 

INDEX NUMBER THEORY AND PRACTICE 


The object of this chapter is two-fold. First, the theory of index num- 
bers and certain refinements of technique will be further discussed. Sec- 
ond, a description of a number of indexes will be given. The indexes were 
selected partly on account of their wide usefulness, and partly on account 
of the interesting technique which they employ. In general it will be found 
that in actual practice none of the procedures outhned in Chapter XX 
will be followed exactly, but that in each case there will be circumstances 
which justify special modifications of method. 

Index Number Concepts 

Mathematical tests. One school of thought on index numbers believes 
that there may be such a thing as a perfect index number formula, and 
that such a formula can be recognized by its ability to meet certain mathe- 
matical tests of consistency. Whether or not those tests are logically 
valid is an open question. Not only can an index be considered “ideal” 
if it meets those tests, according to this theory, but other indexes that do 
not meet them can be graded according to how closely they approximate 
them in actual practice. 

The tests are derived by the logic of analogy. Anything that is true of 
an individual commodity should also be true of a group of commodities 
considered as a whole. If a pound of cotton was worth 125 per cent as 
much in 1936 as it was in 1926, then the 1926 price was 80 per cent of the 
1936 price. Reasoning by analogy, if an index number for 1936 was 125 
with respect to a 1926 base, then a similar index number for 1926 should 
be 80 with respect to a 1936 base. In other words, an index number 
should work backward as well as forward. Again, suppose that a com- 
modity increases from 40 cents to 60 cents and that the sales increase 
from 2 units to 4 units. The price is 150 per cent of the base year, the 
quantity sales are 200 per cent, while the value is 1.50 X 2.00 = 3.00 
times the base year, or 300 per cent of the base year. This is verified b> 
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60 X 4 

noting that ■ _ = 3. Once more reasoning from analogy, it may be 

.41) X ^ 

argued that a price index times a quantity index computed from the same 
data should equal the relative value of the transactions in the given year 
with respect to the base year. In other words, if 

Vn^Qn^ Vnqn ^ 

'Po qo Po qo Vo 

then it should be true that 

P X Q = = 7. 

^Poqo 

As indicated in the preceding paragraph, there are two tests which are 
considered especially important by the “mathematical test’^ school. These 
may be called: (1) the time reversal test; (2) the factor reversal test. 

The time reversal test may be stated more precisely as follows: If the 
time subscripts of a price (or quantity) index number formula be inter- 
changed, the resulting price (or quantity) formula should be the reciprocal 
of tl^e original formula. , If we take the formula 

Spngo 

Spogo 

and interchange the time subscripts, the resulting formula is 

Spogn 

Spngw 

But 


^pngo Spogn ^ 1 . 

Spogo ^ 2pngn ' 

hence the test is not met. On the other hand, the formula 

tepngo Spwgn 

1 2pogo ^ 2pogn 

becomes 

V ^Pogn ^ I^pogo 
Xpnqn ^PnqJ 

the product of the two expressions is unity, and.the “ideaF^ index meets 
the time reversal test. 

The factor reversal test may be stated in this way: If the p and q factors 
in a price (or quantity) index formula be interchanged, so that a quantity 
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(or price) index formula is obtained, the product of the two indexes should 

give the true value ratio 

2p n^n 


Sp.go 

Again taking the formula 

^PnQa^ 


ipcqo 

we transform it into 

'SqnPo 


'SqoPo 

This is a quantity index, but since 


^Pnqo ^ 'Sqnpo ^ ^PnQn ^ 
hpoqo ^ '^qoPo ^Poqo' 


the test is not met. However, we find that 

V ^Pnqo ^ 

2pogn 

transforms into 

J ^qnPo ^ 2gnPn 
j^qoPo ^qoPn 

The product of these two ^ddeaF’ indexes is 

'^PoqJ 

and the test is met. 


The “ideah' index number is so called because it is one of an extremely 
limited number of indexes that meet both of these tests. 

Relationship of formula to use. The concept of an ^ddeaF’ index is 
attacked by index number students belonging to a different school of 
thought on the ground that the analyst cannot say exactly what the 
^ideah’ index measures; he can only assert vaguely that it measures a change 
in the price level, or use some similar expression. To Willford I. King^ 
the logical procedure is to ask a specific question, and then to devise a 
formula which will answer that specific question. For instance, the for- 

„ /7 

mula compares the cost in the present year with the cost in the base 
^Poq<i 

year of supporting the physical scale of living which obtained in the base 


^ See, Willford I. Eang, Index Humhers 'Elucidated. Xiongmans, Green and Company, 
New York, 1930, especially Chapter III. 
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year. While this is a specific question, it may not be the most useful 
question to ask. Just what is an appropriate question to ask is an im- 
portant problem facing the person conducting the investigation. In 
Chapter XX Keynes was interpreted as believing it appropriate that, 
for measuring changes in the value of money, one should first seek an 
index number that would measure the changing cost of aggregates of goods 
yielding the same utility to similar groups of persons at two periods. 

Now the formula assumes that, if their tastes do not change, 

people will continue to buy the same amounts of goods no matter how 
great the price rise or fall, while actually there is a shift from those 
items which are becoming more expensive to those which are becoming 
cheaper. From a Keynesian point of view, then, this formula would have 
an upward “bias,^^ since the cost of obtaining the same quantity of goods 
would be higher than the cost of obtaining the same quantity of utility. 

The formula on the other hand, compares the cost of supporting 

^PoQn 

one’s present physical scale of living with its cost in the base year. This 
formula, from the same point of view, has a downward ^^bias,” since no 
sensible person would have bought the same goods in the base year as 
he does now (even granting the same tastes and environment) because the 
relative prices of goods would have been different. The cost of obtaining 
the present year’s bill of goods in the base year would have been greater 
than the cost of obtaining the current year’s economic satisfactions. 

Fisher’s “ideal” index formula is the geometric mean of two index num- 
bers biased (or inappropriate) in opposite directions; and many persons 
hold that the average of two wrong answers does not necessarily give one 
right answer, even though the two errors are in opposite directions and 
even though the formula is internally consistent. On the other hand, it 
is doubtful that Keynes’ common factor method will in actual practice 
answer Keynes’ question any better than (if as well as) the “ideal” index 
number. Changes in relative prices with consequent changes in relative 
quantities purchased may reduce the value of the common factor to a 
small proportion of the total goods bought. Nevertheless, it is a meritori- 
ous attempt to arrive at a logical decision as to exactly what one is trying 
to measure. 

For purposes of measuring changes in the value of money (purchasing 
power of the dollar), it is customary to use the reciprocal of a price index. 
Ferger, however, argues that this is illogical.^ Just as a price index aver- 


2 See Wirth F, Ferger, “Distinctive Concepts of Price and Purchasing Power Index 
Numbers,” Journal of the American Statistical Assodationf Vol XXXII* June 1936, 
pp. 258-272. 
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ages together price changes of specific commodities, so a purchasing power 
index should average together changes in the purchasing power of the 
dollar for specific commodities. If the price of corn is $.50 per bushel, 
the purchasing power of the dollar for corn is 2 bushels. Designating 
units of purchasing power per dollar by the symbol u, Ferger suggests this 
purchasing power index number formula: 


Purchasing power == 



But since = -, we may write 


Purchasing power = 



This expression is the reciprocal of the harmonic mean of price relatives 
weighted by base year values, since the latter is 



So Ferger^s formula is still in effect (though not in concept) the reciprocal 
of a price index, though not the usual index based on the arithmetic mean. 
Presumably it would be possible to alter somewhat the weighting system 
without doing violence to his concept. 

If we accept the idea that the purpose of an index number determintja 
its formula, we need not, necessarily, abandon the ‘fideah^ formula. It 
would be possible to maintain that, although the formula is not a perfect 
solution to every index number problem, nevertheless there are purposes 
for which it is especially suited, as for instance the analysis of value changes 
into constituent price changes and quantity changes. However, it seem- 
ingly would have to be abandoned as a theoretically sound index if we 
take the position that every index number must answer a specific question 
couched in layman^s English. 


The Chain Index 

Federal Reserve Bank of New York Index of Trend of Production and 
Trade in the United States. This index runs from 1830 to date, and is 
based upon 1870 as 100. The basic index (that is, the index before trend 
is computed) is on a 5-year rather than an annual basis, since it was im- 
possible to obtain satisfactory data for shorter periods. Some series could 
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be obtained only at 10-year intervals, in which case it was necessary to 
make an estimate for intervening 5-year intervals. The list of series avail- 
able for inclusion in the index was small in 1830, but was much larger a 
century later: 

1980 

Crop production 
Boots and shoes 
Sugar imports 
Rubber imports 
Cotton consumption 
Wool consumption. 

Silk imports 
Lumber cut 

Newspapers, etc., published 
Coal 
Lead 
Copper 
Zinc 

Petroleum 
Cement 
Common brick 
Face brick 
Window glass 
Pig iron 
Steel 

Motor vehicles, passenger and truck 
Gas (manufactured) sold 
Electricity 

Employment in manufacturing 
Postage stamps issued 
Railway freight carried 
Telephones 

Trade and transportation employment 

With a few exceptions these series are in physical, rather than value, 
units. Not only is the 1930 list longer than the 1830 list, but four of the 
ten 1830 series have disappeared from the index. In view of the growing 
and changing list of items it was impossible to compare each year directly 
with 1870. Consequently, the percentage that each year was of the year 
five years preceding was computed. Furthermore, since the sample was 
of necessity unsatisfactory from the standpoint of size, representativeness, 
and accuracy — ^particularly during the early years — and the dispersion of 
the relatives so great, it was thought advisable not to average together 
all the items each year, but to compute a modified mean from the central 
three or four items. The procedure for five successive pairs of years is 
shown in Table 149. The modified means appearing in the bottom row 
of the table may be called link relative index numbers. 

The basic index is now obtained by a process of successive inultiplica- 


1830 

Sugar imports 

Cotton consumption 

Newspapers, etc., published 

Coal 

Lead 

Pig iron 

Volume of U. S imports 
U S tonnage cleared 
N Y State canal traffic 
Gross postal receipts 
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tion, beginning with 1870 as 100, as illustrated by Table 150. The logic 
is as follows: 

Let 1870 be the base year, or 1870 = 100.0; 

1875 is 132 0 per cent of 1870, or 1.320 X 100.0 = 132.0; 

1880 is 145 3 per cent of 1875, or 1 453 X 132 0 = 191.8; 

1885 IS 136.3 per cent of 1880, or 1 363 X 191 8 == 261.4; and so on. 

To obtain years preceding 1870, we must express each year as a percentage 
of the year following. Thus, if 1870 is 137.8 per cent of 1865, then 1865 is 

— ™ == 72.57 per cent of 1870. This is the last number given in column 
l.o7o 

4 of Table 150. The other numbers in this column were obtained in a 
similar fashion. We may then proceed to reason as follows: 

Let 1870 be the base year, or 1870 — 100 0, 

1865 IS 72 57 per cent of 1870, or 7257 X 100 = 72 57; 

1860 IS 81.43 per cent of 1865, or 8143 X 72.57 = 59 09; 

1855 is 81 37 per cent of 1860, or .8137 X 59 09 = 48.08, and so on. 


TABLE 150 

Construction of Chain Index op Production and Trade of the United States 
(From Link Relative Index Numbers) and Index of Trend, 1830-1935 


Period 

Cl) 

Link relative, 
later year as 
percentage 
of earlier 
(2) 

Period 

(3) 

Link relative, 
earlier year as 
percentage 
of later 
(4) 

Year 

(5) 

Basic 

index 

(6) 

Index 

of 

trend 

C7) 

1830-1835 

147.1 

1835-1830 

67 98 

1830 

10 80 

114 

1835-1840 

124.8 

1840-1835 

80 13 

1835 

15 88 

15 1 

1840-1845 

135.1 

1845-1840 

74 02 

1840 

19 82 

20 1 

1845-1850 

140 2 

1850-1845 

7133 

1845 

26 77 

26.7 

1850-1855 

128 1 

1855-1850 

78.06 

1850 

37 53 

35 5 

1855-1860 

122 9 

1860-1855 

81 37 

1855 

48 08 

47 1 

1860-1865 

122 8 

1865-1860 

81 43 

1860 

59 09 

63.4 

1865-1870 

137 8 

1870-1865 

72.57 

1865 

72 57 

84 5 





1870 

100 0 

112 0 

1870^1875 

132 0 


... 

1875 

132.0 

i 147.3 

1875-1880 

145.3 



1880 

1918 

i 192 5 

1880-1885 

136.3 



1885 

261.4 

249.8 

1885-1890 

132,7 



1890 

346.9 

1 322 0 

1890-1995 

121,7 



1895 

422 2 

* 412.2 

1895-1900 

129.3 



1900 

546 9 

1 524.0 

1900-1905 

1318 



1905 

719.5 

661 5 

1905-1910 

123.7 



1910 

890 0 

829.5 

1910-1915 

119.75 



1915 

1,065.8 

1,032.9 

1915-1920 

121.5 



1920 

1,294 9 

1,277.5 

1920-1925 

129.0 



1925 

1,670.4 

1,569.1 

1925-1930 ‘ 

94.5 



1930 

1,578.5 

1,914.0 


... 



1935 


2,318.9 


Source: Table 149 and Research Department, Federal Reserve Back of New York 
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In Table 150 there is an additional column for the trend. It -was recog- 
nized that complete accuracy could not be expected for each individual 
index number; but it was believed that a fitted trend would smooth out 
the errors and give a reasonably accurate picture of the nature of the 
growth of United States physical production and trade. The trend equa- 
tion fitted to the 1830-1860 data is 

log Yc = .42677 -f .0246669X, 

with origin at 1845 and X units of one year. The trend values from this 



Chart 214. Federal Resenre Baiak of New York Annual index of Production and Trade 
and Fitted Trend, 1870-1930. (1870 = 100. Data of Table 150.) 

equation are for 1830-1855. For trend values from 1856 to date, the fol 
lowing equation was obtained by fitting to the 1840-1930 data; 

log Yc = -.011585 + .0289066Z - .0000597117X2. 

The origin for this equation is 1830. Trend values are computed for each 
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year, though Table 150 shows trend values for each fifth year only. The 
index and trend are also shown in Chart 214. This index is not published 
in any periodical, but may be obtained upon request from the Federal 
Reserve Bank of New York. 

The chief advantage of chain index numbers (described very briefly in 
Chapter XX also) is the ease with which new items may be added to the 
sample and old items dropped, or new items substituted for old ones* 
The link relative index numbers, of which the chain index is composed, 
compares data that are reasonably comparable, since the intervening period 
is short and the quality of the items included has not had much time to 
change. Furthermore, if the index is wmghted, a weighting system may 
be had that compares the two years with substantial accuracy. But when, 
the links are chained together, precise comparability is lost, and the mean*^ 
ing of the index cannot be stated in simple language. 

Circular test Among the tests sometimes advocated for index numbers 
is the circular test, which may be regarded as an extension of the time 
reversal test but which is applicable to chain indexes only. It is argued 
that, just as an index number formula should work backward as well as 
forward between two years, so it should work in circular fashion among a 
number of years. The working of the test is most easily explained by 
reference to an illustration such as that given by Table 151. In that 

TABLE 151 

CiKcuLAK Test Applied to Federal Eeseryb Bank op 
New York Index op Production and Trade in 
THE United States, 1860-1885 


Period 

Link 

relative 

Year 

Chain 

index 



1870 

100 00 

1870-1875 

132 0 

1875 ^ 

132 00 

1875-1880 

145 3 

1880 

191.80 

1880-1885 

136 3 

1885 

261 42 

1885-1860 

20 84* 

1860 

54.48 

1860-1865 

122 8 

1865 

66.90 

1865-1870 

137 8 

1870 . 

92 19 


* Positional mean is average of 


Sugar imports 25 47 

W ool con'?uint»t’o»' . . 22 72 

Pig I’-on (production) . 20 38 

Postage stamps issued . 14 79 


Source. Computed from data provided by Researcb Department, Fed- 
eral Reserve Banlc of New York 


table, which uses the data of Table 150, a circular chain is forged, which 
begins with 1870, chains succe&siYely the links 1870-1875, 1875-1880, 
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1880-1885. 1885-1860, 1860-1865, 1865-1870, and arrives at the starting 
point 1870 with a new index number. The link 1885-1860 is the mean of 
the four central 1860 percentages with respect to 1885. As can be seen, 
there is a discrepancy between the two index numbers for 1870: 100.00 
and 92.19. 

The reasons why most chain indexes do not meet the circular test are 
as follows: 

(1) The type of formula is not one which meets the test. There are 
very few formulae that do. Among those that do are the geometric mean 
with constant value weights, and the aggregative index with constant 
quantity (or price) weights. The simple arithmetic mean gives a terminal 
value which is always greater than 100. The reason for the discrepancy 
is the implicit change in the quantity element in the value weights for a 
price index (see p. 599), or in the price element in the value weights for a 
quantity index. The simple median has no bias one way or the other, 
but it is only by accident that this median may meet the test. A modified 
mean which includes a large proportion of the available items would pre- 
sumably have an upward bias. However, when the proportion of items 
is small, as in our present illustration, the discrepancy might be in either 
direction. The variable nature of the commodities averaged by our dif- 
ferent modified means is the most important reason for our present dis- 
crepancy. Below is the list of series used. Of the 21 series appearing 
during the six periods, 17 are different; not one appears throughout, and 
only one series (Newspapers, etc., published) appears as often as three 
times. 


1860-1865 

Employment in manufac- 
turing 

Crop production index 
Trade and transportation 
employment 

Newspapers, etc , published 
1876-1880 
Cotton consumption 
Zinc (production) 

Rubber imports 


1865-1870 
Coal (production) 

U. S. tonnage cleared 


Gross postal receipts 


1880-1885 
Steel (production) 

Trade and transportation 
employment 

Newspapers, etc., published 


1870-1876 
Copper (production) 
Newspapers, etc , published 


Boots and shoes manufac 
tured 

Rubber imports 

1885-1860 
Sugar imports 
Wool consumption 
Pig iron (production) 
Postage stamps issued 


(2) Generally an important reason for using a chain index is that the 
relative importance of the different items changes with the passage of 
time, and the chain index is one device for putting such changes into 
effect. Such changes in weights are intentional. But, as explained in 
the preceding paragraph, they may be unintentional— the result of using 
an inappropriate type of weighting for the index number formula used 
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Whether the changes are intentional or unintentional, they cause the chain 
index to fail to meet the circular test. 

(3) Usually also the list of commodities changes over time, and this is 
an important reason for employing the chain index. In the present in- 
stance there were a few changes in the list of commodities during the 
period 1860-1885. 

Whenever there is a change in the commodities used or the weights 
given to them, there is a break in comparability. Consequently most 
authorities argue that there is no reason why a chain index should meet the 
circular test. If the commodities and weights remain the same, most of 
the index number formulae will meet the test, if they are used properly. 
But if the commodities and weights remain the same, there is no reason 
for using the chain index. 


Substituting New Commodities and Changing Weights 


The preceding chain index illustration concerned a simple average of 
relatives. The use of weights would have introduced no new problem. 
However, when an aggregative type of index is used, the problem of ad- 
justing weights puts certain pitfalls in the path of the statistician. 

We shall again utilize the building material price data with which we 
are familiar. Let^'us assume that it is desirable to substitute California 
redwood for hard maple beginning 1936. Consequently there are two sets 
of P 36^26 values shown in Table 152 — one for the old series, using hard 
maple; and a second for the new series, which uses California redwood 
instead. The product for 1937 uses California redwood. The 

old series shows that 1936 prices are 88.12 per cent of the 1926 prices, 
while the new series indicates that 1937 is 108.24 per cent of 1936. But 
since the 1936 index is 88.12, the 1937 index, relative to 1926, is 88.12 X 
1.0824 = 95.38. 


The pitfall into which we have stepped is that lumber is given additional 
weight by the substitution of redwood for maple. This is due to the fact 
that the price of redwood is considerably higher than that of maple, and 
since the Same quantity weights were used in the new series as the old, 
that part of the value aggregate due to the lumber was greatly enhanced. 
Specifically the value of the lumber in thousands of dollars was increased 
from 1,154,839 to 1,482,604; or, we may say, its relative importance In 

the mdex was increased from g Q 25 443 ~ 4 ~ ^ 3 'T 5 i “ 


34.9 per cent. 

As might be suspected, the remedy for this diflSculty is to decrease the 
quantity weight for the lumber representative in the same ratio that the 


price of maple is to the price of redwood. 


This ratio is 


60.75C 

47.322 


= L28357S, 



TABl/E 152 






1937 

Product 
(thousands 
of dollars) 

P37§26 

(9) 

329,091 

249,288 

1,757,160 

583,975 

206,488 

168,409 

1,309,129 

4,603,540 
108 24 

88 12 X 1 0824 - 95 38 

Price 

(dollars) 

2>37 

1 

(8) 

12 048 
1667 

72.000 

2 031 

9 086 

2 215 
.886 


1936 

Product, 
new series 
(thousands 
of dollars) 

7>36§26 

(7) 

336,330 

249,288 

1,482,604 

583,975 

197,693 

141,418 

1,261,846 

4,253,154 
100 00 

Product, 
old series 
(thousands 
of dollars) 

P36§26 

(6) 

336,330 

249,288 

1,154,839 

583,975 
197,693 
141,418 1 
1,261,846 

J 

3,925,443 

88 12 

88.12 

Price 

(dollars) 

P36 

(o) 

12 313 

1 667 
47.322 
60.750 

2 031 

8.699 1 

1 860 

854 1 

1 



1926 

Product 
(thousands 
of dollars) 
??26§26 

(4) 

380,034 

260,803 

1,358,700 

634.868 
' 281,212 

148.869 
1,390,395 

00 OOT 

j 

1 

00 001 

Price 

(dollars) 

p26 

(3) 

13 913 
1744 
55.673 

2 208 

12 374 
1958 
.941 



Quantity 

weights 

(thousands) 

§26 

(2) 

27,3X5 

149,543 

24,405 

287,531 

22,726 

76,031 

1,477,572 



Commodity 

(1) 

Common building brick 
Portland cement . . . 
(Hard maple, No. 1 . 

jCalifornia redwood . 
Outside white gloss house 
paint 

Lavatories 

Structural steel . 

Building gravel . . . 

Total ..... 
link relative index number 

Chain index number 
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c Table 138 and n 604 Units are shown in Table 138 
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and the new quantity weight is 24,405 thousands of board feet -r- 1.283578 
= 19,010.6 thousands of board feet. The use of this quantity weight 
gives the same P 36^26 value for the old series as for the new, as can be 
seen by inspection of columns 6 and 7 of Table 153. This means that 
the importance of the lumber item is unchanged after making the substi- 
tution of redwood for maple. Now, since redwood increased in price by 
a relatively/ large percentage between 1936 and 1937, the 1937 link relative 
with 1936 as base is smaller than in Table 152, which gave too much weight 
to the item. The new link relative is shown by Table 153 to be 107.38, 
which gives a 1937 chain index number of 88.12 X 1.0738 = 94.62. This 
index number can also be derived directly from the data of Table 153 by 
the expression 


P == ^ 4,215,143 

Sp26g26 4,454,881 


94.62, 


where g' is a quantity multiplied by the appropriate correction factor. 
We see therefore that column 7 of Table 153 is not needed, and that it is 
not necessary to obtain link relatives and chain them to the base, if we 
merely make the appropriate adjustment in quantity weights whenever a 
new commodity is introduced. 

The chain index is useful, not only when making substitutions in the 
list of commodities, but when making a change in the weights. Suppose 
that new quantity weights (say 1935 quantities) are to be introduced in 
1936. The 1936 index number, which uses the 1926 weights will be un- 
changed at 88.12. We shall, however, have different products for com- 
paring 1937 with 1936. For instance (using hypothetical data not showi' 
in our tables), w^e may have 

= 106.97. 

2^36^35 4,091,765 


This would give a chain index value for 1937 of 88.12 X 1.0697 = 94.26. 

Now if the index is to be carried forward with the new set of weights, 
it would be somewhat laborious to chain each link relative to the index 
number for the preceding year in order to obtain a chain index. A much 
easier procedure is to adjust the base year aggregate in such a way that the 
same result can be obtained directly. In the present instance we may 
obtain the adjusted base year aggregate as follows: 


= 2p36 g35 P36 

^ 4,091,765 -V- .8812 
« 4,643,401. 



TABLE 153 




1937 

Product 
(thousands 
of dollars) 

PB7Q'26 

(9) 

1 

329,091 

249,288 

1,368,763 

683,975 

206,488 

168,409 

1,309,129 

CO ±- 

^ 00 
tH CO 

tH O 05 
(M ^ 

Price 

(doUars) 

PS7 

(8) 

12 048 
1667 

72.000 

2 031 
9.086 1 

2 215 
.886 


1936 

Product, 

1 new series 
(thousands 
of dollars) 

P3Qq'26 

(7) 

336,330 

249,288 

1,154,894 

583,975 ! 
197,693 ; 
141,418 ' 
1,261,846 

3,925,444 
100 00 

Product, 
old series 

1 (thousands 
i of dollars) 

^36^26 

I (6) 

336,330 

249,288 

1,154,893 

583,975 

197,693 

141,418 

1,261,846 

3,925,443 

88.12 

88 12 

Price 

(dollars) 

P36 

(6) 

12 313 

1 667 

47 822 
60.750 

2 031 

8 699 
1860 

.854 


1926 

Product 
(thousands 
of dollars) 
7>26926 

(4) 

380,034 

260,803 

1,358,700 

634.868 
281,212 

148.869 
1,390,395 

4,454,881 
100 00 
100 00 

Price 

(dollars) 

?^26 

1 

1 (3) 

13.913 

1.744 

56 673 

2.208 

12.374 

1958 

941 


Quantity 

weights 

(thousands) 

^26 

(2) 

27,315 

149,643 

24,405 

19,010.6* 

287,531 

22,726 

76,031 

1,477,572 


Commodity 

(1) 

Common building brick 

Portland cement 

(Hard maple, No 1 , 
ICalifornia redwood . . 
Outside white gloss house 

paint 

Lavatories .... 

Structural steel . 

Building gravel. ... 

Total . . ... 

Link relative index number 
Chain index number 
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It is obvious that our final index niunber for 1937 may be obtained as 
follows: 


■P37 


Sp37 g35 
2 p 26 g'35 
4,376,918 
4,643,401 


94.26 per cent. 


Subsequent index numbers with 1926 as a base are now computed directly 
by the expression 

•p 

2 p 26 ^'35 

The United States Bureau of Labor Statistics claims these advantages 
for the two techniques explained above^ (which may be employed in com* 
bination as well as separately) : (1) The relative importance of a com- 
modity is unaffected by the substitution of one price series for another. 

(2) The making of special group indexes is facilitated, since the chaining 
back process is side-stepped. 


Some Price Indexes 

United States Bureau of Labor Statistics Index of Wholesale Commodity 
Prices, This index, kept up to date on an annual, monthly, and weekly 
basis, is probably the most widely used price index in existence.^ It ex- 
tends back on a monthly and annual basis through 1890, and on a weekly 
basis through 1931. In January 1931 the number of price series included 
in the index was increased from 550 to 784, and the calculations were re- 
vised on that basis back to and including 1926. At present 813 series are 
included. A feature of the index is that index numbers of several groups 
of commodities are published, as well as those of the 813 series as a whole. 
These groups are: 

(1) Farm products. 

(2) Foods. 

(3) Hides and leather products. 

(4) Textile products. 

(5) Fuel and lighting. 

3 See Revised Method of CaJculation of the Wholesale Price Index of the United 
States Bureau of Labor Statistics,” by Jesse M. Cutts and Samuel T Dennis, Journal 
of the American Statistical AssociahoUj Vol XXXII, December 1937, pp 663-674 

^ For index numbers of wholesale prices 1890-1926, see United States Bureau of Labor 
Statistics Bulletin No. 543, Wholesale Prices, 1930, for 1926-1931, see Bulletin No. 572, 
Wholesale Prices, 1931. Figures for subsequent periods are shown in a monthly pam- 
phlet, Wholesale Prices, in the Monthly Labor Review, and in monthly and weekly official 
releases. Mimeographed tables covering the entire period are available on request from 
the United States Bureau of Labor Statistics. For further description of the index, 
see the article by Cutts and Dennis referred to in footnote 3. 
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(6) Metals and metal products. 

(7) Building materials. 

(8) Chemicals. 

(9) House-furnishing goods. 

(10) Miscellaneous. 

Each of these groups is further subdivided into a number of subgroups; 
and separate index numbers are computed for each. Articles properly 
falling under more than one classification are so listed: thus, structural 
steel is included under building materials as well as under metals and 
metal products ; and eggs are considered both as a farm product and as a 
food. In the computation of the general index, however, there is no dupli- 
cation; each commodity is counted only once. In addition to this classifi^ 
cation according to the nature of the commodity, there are two other classi- 
fications. The first is based on origin: (1) farm products; (2) non-agricul- 
tural commodities; (3) all commodities other than farm products and foods. 
The second is based on degree of manufacture: (1) raw materials; (2) semi- 
manufactured products; (3) finished products. 

The index numbers are constructed by the aggregative method. Be 
gimiing in 1934, current prices of the 784 commodities have been multiplied 
by the average quantities of these commodities marketed in 1929 and 
1931 (1929, 1930, and 1931, in the case of farm products and agricultural 
commodities). Prices of the same commodities in 1926 were also multi- 
plied by the average of 1929 and 1931 quantities. The products are 
summed for each period, and are divided by the 1926 product-sum, in 
order to express the index number as a percentage of 1926. For 1934 the 
index number formula therefore is 

P 2pi934gl929,l931 
Spi926^1929.1931 

The 1929, 1931 weights were not used in computing the index numbers for 
every year. The system of weighting is as follows : 

Period Mean of quantities in: 

1913-1914 inclusive . 1909 and 1914 

1915-1919 inclx:isive . 1914 and 1919 

1920-1921 inclusive , 1919 and 1921 
1922-1923 inclusive. . . 1921 and 1923 

1924-1929 inclusive 1923 and 1925 
1930-1931 inclusive . . 1925 and 1927 
1932-1933 inclusive . 1927 and 1929 

Beginning 1934 1929 and 1931 

The quantity weights are obtained from census reports and from other 
governmental and private sources. Current weights cannot be used ex- 
cept for revisions of the index, but the weights are kept as nearly up to 
date as is practicable. 
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The introduction of new commodities or of different weights from time 
to time necessitated some method of splicing the index together so that 
comparability would be retained, and from 1908 until 1937 the index has 
been computed as a chain index. Until 1937, when a new series was in- 
troduced to replace an old one which was no longer satisfactory or avail- 
able, a procedure was followed similar to that shown in Table 152. (Of 
course, no substitution was made which destroyed the continuity of any 
series to such an extent as that illustrated.) Because of the theoretical 
and practical difficulties, the technique was modified in 1937 and is now 
like that indicated at the end of the preceding section dealing with sub- 
stituting new commodities and changing weights. The weekly, montlily, 
and annual indexes are calculated independently of each other, and as a 
result of early imperfections in technique the weekly indexes differ slightly 
from the monthly index. The revised method, however, will bring the 
two indexes into substantial conformity from 1937 on. 

The United States Bureau of Labor Statistics also makes a retail food 
price index, and an index of the cost of living, of which the retail food 
price index is an important component. The cost of living index is de- 
scribed in the following paragraphs, while a brief description of the index 
of retail food prices will be found in Davenport and Scott, An Index to 
Bitsiness Indexes, pp. 69-70, Business Publications, Inc., Chicago, 1937. 

Changes in cost of living: United States Bureau of Labor Statistics Index. 
Index numbers of cost of living are currently computed each month by 
the National Industrial Conference Board,^ and several times each year 
(usually quarterly) by the United States Bureau of Labor Statistics. The 
United States Bureau of Labor Statistics computes separate indexes for 
each of 32 cities with population over 50,000, and for the United States 
as a whole. The total cost of living index numbers for each city and for 
the nation are themselves obtained by combining six group indexes: 

(1) Food. 

(2) Clothing. 

(3) Rent. 

(4) Fuel and light. 

(5) House-furnishmg goods. 

(6) Miscellaneous goods and services. 

These separate indexes for each city are computed by the aggregative 
method, the weights in each city being the average amount of goods pur- 
chased per family per year by wage earners and low-salaried workers in 

® This index is published currently in the Survey of Current Business. A discussion 
of it wdl be found in Conference Board Research Staff, Cost of Limng in the United 
Sta^f 191^-1936^ National Industrial Conference Board; New York, 1936. 
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that city. Frequently a weight applied to a particular commodity is not 
the amount of that particular commodity purchased, but is derived from 
the amount spent for a group of related commodities represented by the 
commodity m question. The different group indexes are chain indexes; 
this method is used because of the constant change in the form in which 
consumer goods are offered for sale. In combining the separate group in- 
dexes into a cost-of-living index for any city, the six index numbers are 
weighted by 1917-1919 expenditures for the city in question. 

No attempt is made to compare the relative cost of living in the differ- 
ent cities, but for each city a comparison is made between its current cost 
of living and its cost in the base period, 1923-1925. In combining the 
different city indexes into a national index, the weights are in proportion 
to the population of the metropolitan areas where retail prices are collected 
plus that of adjacent large urban centers in which it is believed that prices 
move in a similar fashion. A national index is made for all items and for 
each of the six groups.® 

Geographical variations in cost of living: TF. P. A. Index of Intercity 
Diferences. The computation of index numbers comparing the cost oi 
living in different cities is much more difficult than making a comparison 
over time, because consuming habits vary so much geographically, and 
because the purchase of the same goods yields such a varying amount of 
satisfaction in different regions. For instance, an expenditure for fuel that 
is ample for heating a house one season in a city in Maine would suffice 
for a number of years in Birmingham, Alabama, Nevertheless, the Works 
Progress Administration attempted a comparison between the cost of liv- 
ing in 59 cities in March 1935.^ Instead of using as weights average 
quantities purchased per fanaily, synthetic budgets were constructed, one 
at the maintenance level and one at the emergency level, and two sets of 
index numbers were constructed. Although the same general budget was 
used for each city, variations were made for many of the factors. For 
example, the need for heating and refrigeration varies with climate, and 
the need for transportation varies with population and land area. Con- 
sequently, in order to obtain the same standard of living in different cities, 
it was necessary to vary the quantity of these items allowed in the differ- 
ent cities. Cost of refuse disposal and school attendance was a part of 
the cost of living in some cities but not in others, depending on the local 


® For further description, see ^‘Revision of Index of Cost of Goods Purchased by 
Wage Earners and Lower-Salaried Workers/' by Faith M. Williams, Margaret H. Hogg, 
and Ewan Clague, in the Monthly Labor Review^ Vol. 41, September 1935, pp. 819-837. 

^ See Intercity Differences in Cost of Living in March 198Sy 59 CiiieSy a report by 
Margaret Loomis Stecher, Works Progress Administration, Division of Social Research, 
Washington, 1937. 
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laws. Likewise the allowance for taxation was not uniform. On the other 
hand, the cost of postage, telephone calls, and insurance was assumed to 
be the same everywhere. 

This brief discussion is not a description of the method of constructing 
this index; it is intended merely to give some idea of the tremendous diffi- 
culties which such a study entails. 

Snyder^s Index of the General Price Level, The Federal Reserve Bank 
of New York maintains currently an index of the general price level, which 
is a weighted arithmetic average of a number of component price indexes.^ 
These series are: 


Series Weight 

Retail food prices . . . . . 10 

Rents 5 

Other cost-of-living items . . . 10 

Industrial commodities at wholesale ... 10 

Farm prices at the farm 10 

Transportation costs 5 

Realty values .... 10 

Security prices 10 

Equipment and machinery prices ... 10 

Hardware prices 3 

Automobile prices 2 

Composite wages 15 

General price level 100 


Originally this index was intended as a companion index to Snyder’s 
Index of the Volume of Trade, and Snyder was able to find a number of 
interesting relationships between these two and other logically related 
series. Since the volume of trade index is no longer so inclusive as the 
price index, comparisons involving the two indexes no longer have the 
same significance. 

Indexes of Physical Volume of Production and Trade 

Board of Governors of the Federal Reserve System Index of Industrial Pro- 
duction, This index, which is published monthly in the Federal Reserve 
Bulletinj is a good illustration of a quantity index constructed by the 
aggregative method. The general index is itself composed of two indexes : 
an index of manufactures, with 53 series combined into 13 separated in- 
dexes of the different industrial groups, in addition to a number of sub- 
groups; and an index of minerals, with 8 series. The base period of the 


® The current index numbers of the general price level are published in the Monthly 
Remew of Credit and Business Conditions, Second Federal Reserve District, For a de- 
tailed description of this index, see ^‘The Measure of the General Price Level, by Carl 
Snyder, in The Review of Economic Statistics, February 1928, pp. 40-52. 
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indexes, which are carried back to the beginning of 1919, is the 3-year 
average of 1923-1925. 

Much difficulty was encountered in obtaining suitable series for the 
ffidex of manufactures. The construction industry, for instance, is not 
directly represented. Furthermore, many series that are available are not 
strictly comparable over a long period of time. An index of physical 
volume which fails to take into consideration the constant refinements 
made on mechanical contrivances will underestimate the growth of such 
industries. Nevertheless, many branches of industry which cannot be 
represented directly have been given indirect representation through other 
series. Thus, “steel ingots fairly measure current movements in more 
advanced stages of steel manufacture and less closely represent the broader 
swings of manufacturing actmty in industries making finished products 
irom steel.’^^ However, the series in the manufacturing index are said to 
represent, directly or indirectly, 80 per cent of all manufacturing industries. 

The prices used as weight multipliers are derived from value figures as 
follows: 

1. Minerals. The total value of a given mineral produced during the 
three years 1923-1925, as reported by the Geological Survey of the United 
States Bureau of Mines, is divided by the total quantity produced in 
those same years. 

2. Manufactures. The total value added by manufacture in 1923, as 
reported by the Census of Manufactures, is divided by the appropriate 
quantity figure for that year. Value added by manufacture is taken in- 
stead of actual value, in order to avoid counting an item in both its raw 
and its manufactured state. 

Figures for 1923 are used solely because 1925 values were not available 
at the time this index was undertaken. Strictly speaking, each series is 
weighted not according to its own value but according to the relative im- 
portance of all industries that it represents in the index. In a number 
of instances the weighting of the different series was somewhat arbitrary. 

In Chapter XX it was stated that, in dealing with time series represent- 
ing physical volume, it is often desirable first of all to eliminate from the 
series irregularities that are due to the varying number of calendar days 
or working days in each month. In the case of most series, therefore, the 
Federal Reserve Board first reduces its monthly figures to daily averages 

® Federal Reserve Bvlletinf March 1927, Vol. XIII, No. 3, p. 170. The writers are 
indebted to the Division of Research and Statistics, Board of Governors of the Federal 
Reserve System for part of the mformation concerning this index. Since publication 
of this text the index has been revised. The new base is the five-year average 1935^ 
1939 The average of relatives method has replaced the aggregative method, and other 
changes have been made. See Federal Reserve Bulletin, Vol. XXVI, August 1940, pp^ 
763-771 and September 1940, pp. 912-923- 
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by dividing them by the appropriate number of working days in the month. 
It was found that the industries fell into three groups in respect to time of 
operation: (1) those running continuously, such as pig iron blast furnaces, 
non-ferrous metal smelters and refineries, and petroleum refineries; (2) 
those closing on Sundays and certain important holidays, and operating 
on the average about 310 days in the year; (3) those closing, in addition, 
a half day on Saturdays, and operating about 280 days a year. 

An aggregative index of volume, it will be recalled, is obtained by muh 
tiplying the quantities of the various series in the base period and in the 
given period by the same set of weights (which weights are prices), sum- 
ming these base period values (prices X quantities) and these given period 
values separately, and dividing the latter by the former. As applied to 
this particular index, the iormula is 

Q ~ S^nPl923 

2gi923-1925Pl923 

Shoe production in July 1937 amounted to 34,842,341 pairs. Since 
there were 23.5 working days in that month in the shoe industry, the aver- 
age daily amount produced was 34,842,341 ^ 23.5, or 1,482,653 pairs of 
shoes. The average daily production for the three years 1923-1925 was 
1,167,839 pairs. Since the derived price for shoe production was found 
to be $1.25,^^ that part of the numerator which refers to shoes is 1,482,653 
X $1.25, or $1,853,316; and for the denominator it is 1,167,839 X $1.25, 
or $1,459,799. 

These two value figures, together with others obtained in a similar 
fashion, and computation of the index number of volume of production 
for July 1937 are illustrated in Table 154. It should be observed that the 
index number 113.5 for total leather and products is obtained, not from 
the two relatives for shoe production and leather tanning, but by relating 
the two totals for leather shown in this table. In like fashion, the index 
for total manufactures was constructed from totals for the various manu- 
factured products. Finally, the totals for manufactures and for minerals 
were combined to produce the index number of industrial production. 

In addition to the index described, the Federal Reserve Board makes a 
second index which is exactly the same, except that seasonal variations 

i°The value added by manufacture by the shoe industry was estimated to be 
$480,000,000 in 1923. Shoe production in that year was 351,114,27.3 pairs. The 
price multiplier was therefore $480,000,000 ^ 351,114,273, or $1 37 for the annual 
index numbers. However, since the shoe industry operates only 282 days a year in 
stead of 310 days, which was taken as standard, the price multiplier for the monthly 
index was 


$1-37 $1.25. 

olO 
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TABLE 154 


Computation op Index Numbees op Volume op Industbial Production, July 1937 

(Unadjusted index) 


Series 

(1) 

?1923-1925Pl923 

(000 omitted) 

(2) 

9»P1923 

(000 omitted) 

(3) 

Quantity 
relative 
or index 
number 

[Col. 3 4- Col. 2] 
(4) 

Leather tanning 

$ 939,417 

$ 869,967 

92.6 

All cattle hide leathers. 

536,133 

472,811 

88.2 

Calf and kip leathers 

201,347 

159,151 

79.0 

Goat and kid leathers . . . . 

201,466 

238,005 

118.1 

Shoe production 

1,459,799 

1,853,316 

1 

127.0 



Total leather and products 

2,399,216 

2,723,283 

113.5 

Total manufactures 

60,639,571 

66,991,269 

110.4 

Total minerals . 

10,169,761 

11,732,474 

115 4 

Total industrial production 

70,809,332 

78,723,743 

111.2 


Source Division of Besearch and Statistics, Board of Governors of the Federal Reserve System 


are eliminated from the various series before the data are multiplied by 
the weights. Seasonal variation is eliminated by the moving average 
method described in Chapter XVII. Changing seasonal is allowed for 
when appropriate. 

Still another feature of these indexes should be mentioned. It was rec- 
ognized that the relative importance of the different industries in 1923 was 
far different from that in 1919. Therefore a second index was computed 
from 1919 to 1922 inclusive, with 1919 weights. This index was then 
averaged geometrically with the index employing 1923 (or 1923-1925) 
weights, and this average of the two indexes was considered the final index. 
For the year 1922, the Index with 1923 weights was given a weight twice 
as great in the average as that with 1919 weights; whereas for 1919, 1920, 
and 1921, the two indexes were weighted equally. Strictly speaking, this 
is not an illustration of the ‘'ideaF^ index number formula, since exact 
given year weights are not used. But the ^^ideaF^ index principle is used, 
and the index numbers for the years 1919-1922 enjoy the advantages and 
suffer from the disadvantages of that method. 

Federal Reserm Bank of New York Monthly Index of Production and 
Trade. This index^^ differs in its purpose from that of the Federal Re- 


n This index is not published currently, but mimeographed releases will be mailed 
'‘^pon request to the Federal Reserve Bank of New York. 



Chap. 21] INDEX NUMBER THEORY AND PRACTICE 


635 


serve Board Index of Industrial Production in two particulars: (1) It 
measures the physical volume of trade as well as production. It includes 
not only production (including construction), but everything for which 
money is paid or for which checks are written, except purely financial 
activity. The net result is an index with a comparatively small range of 
fluctuation. Nevertheless, it seems likely that this index exaggerates these 
fluctuations somewhat, since the series that cannot be obtained (such as 
personal services) are those that are probably the most stable. (2) It meas- 
ures only cyclical movements, excluding secular and periodic movements. 

All told, 82 series are used. Before being combined into the finished 
index, they are adjusted in several ways. As an illustration of the steps, 
involved, the series for mail order house sales has been selected. The 
numerical illustration running through the following discussion is confined 
to the year 1937. In order to bring the whole procedure into view at 
one time, the various operations are summarized in Table 155. 

1. Nearly every series is adjusted to a working day basis; that is, each 
monthly figure has been divided by the number of working days in that month 
which are appropriate for the industry in question. For mail order house 
sales the following are not considered as working days: Sundays, New 
Yearns Day, Washington's Birthday, Memorial Day, Independence Day, 
Labor Day, Thanksgiving, and Christmas. If a holiday occurs on Sunday, 
the holiday is taken on Monday. 

2. Seasonal is allowed for, the seasonal index usually being constructed 
by the per cent of 12-month-moving-average method. This is the method used 
for mail order house sales. Some attention has been given to changing 
seasonals, also, and in some of the retail series allowance has been made 
for the variable occurrence of Easter. 

3. Each dollar series is deflated — that is, adjusted for price change. Since 
it is difficult to obtain accurate deflating indexes, the use of series requir- 
ing adjustment for price changes has been avoided as much as practicable. 
Mail order house sales, however, necessitated such an adjustment, and a 
price index was specially constmcted for that purpose. Even though the 
price series may not be so accurate as might be desired, this procedure is 
better than that of using doUar series entirely imcorrected for price changes. 
The index with such series included is also more comprehensive than it 
would be if only physical volume series were used. 

4. Each series is expressed as per cent of normal. Trend is calculated 
by whatever method seems appropriate. A formula frequently used is 

Yc = bc^. 

For the method of fitting this curve, see “A Trend Line for Growth Series, Further 
Remarks,” by Norris 0- Johnson, Journal of the American Statistical Association, Yol 
XXXI, December 1936, p. 731. 
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This curve increases at a decreasing percentage rate. The equation for 
the trend of mail order house sales, fitted to annual data for the years 
1922“1932, inclusive, is a straight line fitted to logarithms : 

log Y - 3.021189 + .039794X, 

with origin at 1922, X stated in units of one year, and Y units in thousands 
of dollars per working day. Of course, it is necessary to extend the trend 
a few years into the future in order to keep the index up to date. 

5. A few series are smoothed by moving averages. In the case of con- 
struction contracts a 6-month moving average is placed at the sixth month, 
since current construction is influenced by contracts let any time during 
the preceding half year, the influence becoming strongest as the current 
month is reached. No moving average is taken, however, of mail order 
house sales. 

The 82 series, each of which is treated somewhat like mail order house 
sales, are combined into a general index of trade and production, and 
into four group indexes as follows: 

Production ... . . 61 series 

Primary distribution ... 9 series 
Distribution to consumer . 6 series 
Miscellaneous services ... . . 6 series 


Production and trade . . . . 82 series 


The 61 series in the production group index are also combined into thf3 
following subgroups, for each of which indexes are constructed. 


Producers goods 

Number 
of series 

Consumers goods 

Number 
of senes 

All goods 

Numbe/ 
of senes 

Durable 

15 

Durable 

5 

Durable 

. . 20 

Non-durable. . . . 

15 

Non-durable. 

. 25 

Non-durable . 

... 40 

Producers goods 

30 

Consumers goods 30 

Total. . . 

. . 60 


Employee hours 1 

Production 61 


Weights have been derived from Census data and have been based upon 
total value in trade. The value weight applied to an individual series is 
often that of a group of activities represented by the series in question, 
rather than the value in trade of the individual series itself. The weights 
are not fixed weights but themselves have a secular trend. The weights 
shown in Table 156, therefore, are those that apply only to the particular 
period in question. This table shows the method of combining the rela- 
tives into index numbers. It should be noticed that the weights used 
are not exact dollar weights, but only approximate weights in even num- 
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bers which add up to 100. This facilitates the final combination of the 
relatives into the finished index numbers. As has been noted, exact 
weights are not usually of tremendous importance, but they should be 
approximately correct. 

TABLE 156 


CONSTEtrCTION OF INDEX NUMBER OF PRODUCTION AND TrADE FOR DECEMBER 1937 


Series or group 

(1) 

B-elative or 
group index 
number 
(2) 

Weight of 
relative 
or group 
(3) 

Weighted 
relative 
or group 
(4) 

Index 

number 

(5) 

Production 

Durable producers goods . 

58 

10 9 

632 2 


Non-durable producers goods . 

78 

12 3 

959 4 


Durable consumers goods 

54 

56 

302 4 


Non-durable consumers goods . . 

91 

17.6 

1,601 6 


Employee hours 

73 

8.6 

627 8 


Production 


55.0 

4,123 4 

75 

Primary distribution . , . 


161 

1,304 1 

81 

Distribution to Consumer 
Department store sales 

83.8 

89 

745 8 


Chain store grocery sales 

97.8 

5.1 

498 8 


Other chain store sales ... . 

95,2 

3.5 

333 2 


Mail order house sales | 

94.2 

29 

273 2 

' 

Gasoline consumption 

97 0 

32 

310.4 


New passenger car registration . . 

61.8 

2.4 

148.3 


Distribution to consumer . 


26.0 

2,309 7 

89 

Miscellaneous services . 

•• 

2.9 

252 3 

87 

Production and Trade . . . 


100.0 

7,989 5 

80 


SoxiTce. Research Department, Federal Reserve Bank of New York 


A major difference between this index and Snyder ^s Index of the Volume 
of Trade (of which this is a revision, although it was not made under his 
supervision) is that, while Snyder’s index was intended to cover every- 
thing for which money is spent, the present index omits financial activity. 
There were also other changes in the list of series, as well as changes in 
weights. Another important source of variation between the two indexes 
is due to the revision of the trends. The old trend lines, which were fitted 
before the world depression, were higher during recent years than seemed 
reasonable. A refitting of the trends to more recent data has had the 
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effect of lowering the level of the trends for recent years, and therefore of 
raising the index numbers. 

Indexes of business cycles. There are a number of indexes that at- 
tempt to measure the cyclical swings of economic activity. The chief dif- 
ferences among these indexes have to do with the data used, the method 
of computing trend, and the system of weighting. Brief mention will be 
made of a few of these indexes. 

We have seen that the Federal Reserve Bank of New York monthly 
index eliminates the trend and seasonal from each series separately, and 
weights the cyclical relatives according to their importance in trade, the 
weights being revised as the relative importance of the different series 
gradually changes. None of the indexes below follow exactly this pro- 
cedure. 

Barron^ s Index of Production and Trade. The technique applied by 
Warren M. Persons in constructing Barron^s Index of Production and 
Trade is interesting. For the annual index, six series are combinea 

(1) The Persons-Day-Thomas index of the physical volume of manu- 
facturing output. 

(2) Mineral output. 

(3) Building construction. 

(4) Electric power production. 

(5) Railroad freight traffic. 

(6) Wholesale and retail trade. 

The index of manufacturing output (described in Census Monograph VIII, 
1928, The Growth of Manufactures) is so constructed that, for years cov- 
ered by the Census of Manufactures, it coincides with the Census data. 

The different series are expressed as quantity relatives with 1923-1925 
as the base. They are then combined into an index by the use of Fisher^s 
^^ideal” index number formula. Since the data are relatives, the formula is 



The index is thus the geometric mean of two separate indexes: (1) thti 
arithmetic mean of quantity relatives weighted by base year values; (2) 

For further description of this index, see ‘^New Indexes of Production and Trade,'* 
by Norris 0. Johnson, Journal of the American Staitstical Associationj Vol 33, June 
1938, pp. 341-348. For a description of Snyder's index, see F E. Croxton and D J. 
Cowden, Practical Business Statisticsj pp. 390-394, PrentiCC-Hall, Inc., New York, 1934 
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the harmonic mean of quantity relatives weighted by given year values. 
The weights are intended to represent the percentages which the national 
income from the individual groups (represented by the six series) bears to 
the total income from all six groups. 

A trend is now fitted to the combined data. The method, as explained 
on pp. 411-412, consists of: (1) adjusting the data for population changes; 
(2) fitting a straight line to these adjusted data; (3) multiplying these 
straight line values by the estimated population relative to 1923-1925. 

A monthly index, using the same series, is computed in a manner similar 
to that used for the annual index. Seasonal is eliminated for each series 
separately. The long time trend of the annual index is used for eliminat- 
ing the trend from the monthly index. A weekly index, based upon less 
comprehensive data, is also constructed. The data are deseasonalized by 
dividing each series by its weekly seasonal index. The weights used in 
combining the series are averages of the base period and given year weights 
used in the monthly index. The final index is a weighted geometric aver- 
age of the different series. Both the monthly and weekly indexes are 
available in two forms since 1919: (1) deseasonalized; (2) adjusted for 
trend and seasonal. 

Index of Business Activity in Buffalo. This is a local index and is pub- 
lished monthly by the University of Buffalo Bureau of Business and Social 
Research, in its Statistical Survey, under the direction of M. A. Brumbaugh. 
Seven series of local importance are selected. These are adjusted for price 
changes where appropriate; for variation in calendar, business, or working 
days; and for secular trend. The trend value for a particular month is 
the last value of a straight line fitted to the last 19 years for that series. 
It is thus a moving 19-year straight line, the final rather than the central 
value of which is taken as the trend value. This method of computing a 
trend has the advantage of flexibility; and there is not the necessity for 
occasional revisions which change earlier trend values. On the other hand, 
less reliance can usually be placed on the end values of a trend fitted by 
the method of least squares than on the central values, and with the Buffalo 
index the trend is composed entirely of end values. On the average, the 
secular trend continues upward until the middle of 1931, after which it 
turns downward, and flattens out during 1935. 

The weighting of each series is directly in proportion to its economic 
significance, and inversely in proportion to its variability. Economic sig- 
nificance of each series is determined by its representativeness of general 
business conditions, the amount of employment and bu3dng power it 

For a more complete description of these indexes, see Gauging Business Activity/' 
by Warren M Persons, Barrords, January 18, 1937, p. 3. Additional references are 
^veu in that article. 
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represents, the value of its product, and other similar factors. As a meas- 
ure of variability the average deviation is used. The reason a series which 
has great amplitude of fluctuation is reduced in weight is that otherwise 
such a series might tend to dominate the index. 


Computation of Index Numbek of Business Activitt in Buffalo from Cyclical 

Relatives, January 1938 


Series 

Economic 

Average 

Weight 

Weight 

Cyclical , 

Weighted 

significance 

j 

deviation 

[Col 2 T- 
Col 3] 

on base of 
1.00 

relative 

relative 

(1) 

(2) 

(3) 

(4) 

(5) 

m 

(7) 

Bank debits 

35 

12 001 

2 916 

35 

lOOO 

35 OO 

Flour milling 

5 

11871 

421 

.05 

88 4 

4 42 

Employment 

25 

18 765 

1332 

16 

77 6 

12 42 

Postal receipts 

2 

12 388 

161 

.02 

85.1 

170 

New autos reg- 







istered . 

6 

25.845 

232 

.03 

75 6 

2 27 

Department 







store sales 

15 

6.306 

2 379 

28 

100 6 

28 17 

Electric power 







consumption 

12 

12 050 

966 

11 

92 7 

10 20 

Total 

100 


8 437 

100 


9418 

Index number 





1 

94 18 


Source; University of Buffalo Bureau of Business and Social Research, Stahstical Survey Supplement, 
VrI. Xni, No SA, April 1938, p 6, Table I 


The accompanying table illustrates the method of weighting the cyclical 
relatives. An interesting additional feature of this index is that the dif- 
ferent trends are expressed as percentages of 1927 and are averaged to- 
gether (weighted according to their economic significance), thus producing 
a composite trend. In showing the index graphically, two lines are shown 
on one chart: (1) the composite trend index; (2) an index which is the re- 
sult of multiplying the trend index times the cj^'cle index. The area be- 
tween the trend line and the trend X cycle line is shaded, so that the 
resulting silhouette indicates the cycle index. 

The idea of allowing the stability of a series to affect its weight is not 
new. A number of other organizations have been using the idea for some 
time in constructing indexes of cyclical fluctuations of business. Two 
such indexes are described in the following paragraphs. 

The New York Times Weekly Index of Business Activity, The New York 


This description is largely a summary of the information appearing in the Uni' 
versity of Buffalo Bureau of Business Research, StaMical Survey, SupplemenU Vol 
XIII, No. 8A. Audi 193g 
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T%mes compiles a weekly index composed of seven series. It assigns to 
each series an effective weight based upon its relative importance as a 
business indicator and its reliability, or freedom from erratic or non- 
business influences. In order to preserve the effectiveness of these weights, 
each effective weight is divided by a measure of the cyclical variability of 
the series. This measure is the annual average percentage deviation in 
the cyclical data of the extreme months from the mean of the high and 
low months. The series used and their weights are as follows: 


Series 

(1) 

Effective 

weight 

(2) 

Average 

annual 

range 

(3) 

[Col. 2 ^ Col 3] 

(4) 

Adjusted 

weight 

(5) 

Steel ingot production .... 

25 

38 

.66 

10 

Electric power production . . 

20 

6 

3 33 

49 

Miscellaneous car loadings . , 

18 

14 

129 

19 

Automobile production 

10 

56 

18 

03 

Lumber production 

10 

23 

.43 

.06 

Cotton mill activity . . . 

10 

31 

32 

.05 

Other car loadings 

7 

13 

.54 

.08 

Total 

100 


6.75 

100 


Each series, before being combined, is adjusted for variation due to 
working days, for weekly seasonal movements, and for trend. An interest- 
ing feature is that, for most series, secular trend for several years following 
1929 was considered inoperative, and a horizontal line used instead. For 
some of the series the upward trend has now been resumed. More spe- 
cifically, the trends or ^^normaF^ values are as follows: 

Steel ingot production: 69 per cent of capacity. 

Electric power production: Average daily electric power production ad- 
justed for seasonal variation is divided by the adjusted index of steel ingot 
production with its amplitude reduced to one-fifth. These monthly figures 
are smoothed graphically. The reason for reducing the amplitude of steel 
ingot production to one-fifth is that the cyclical amplitude of this series is 
about five times that of electric power production. This method of ob- 
taining trend is based upon three propositions: (1) that a trend should go 
approximately through the center of the different cycles; (2) that a proper 
trend has been discovered for steel ingot production; (3) that the cyclical 
movements of electric power production and steel ingot production are 
similar. 


For a detailed description of this index, see The Nev) York Times Weekly Index of 
Business Activity as Revised July 6, 1986 This pamphlet will §.ent upon request to 
the New York Times 



Chap. 21] INDEX NUMBER THEORY AND PRACTICE 


643 


Car loadings: The trend of each series (miscellaneous car loadings and 
other car loadings) was computed in a fashion similar to that for steel ingot 
production. In each case this produced a downward trend during the 
years 1930-1932. 

Automobile production: Average daily production for the period 1927- 
1930. 

Lumber production: Average daily production for the period 1929-1931. 

Cotton mill activity: Based on percentage of capacity, after a fashion 
similar to steel ingot production. 

The American Telephone and Telegraph Company Index of Industrial 
Activity. The American Telephone and Telegraph Company expresses its 
cyclical deviations in terms of standard deviations. Each series, when so 
expressed, varies from approximately +3 standard deviations to approxi- 
mately — 3 standard deviations. The series are then averaged together, 
each being weighted in proportion to its value as a representative of busi- 
ness conditions. Since the weighted average of all the standard deviations 
is approximately 10 per cent, each index number is multiplied by 10. Thus, 
if the index stands —1.3 standard deviations, it is stated as 13 per cent 
below normal. 

Cleveland Trust Company Index of American Business Activity since 1790. 
This is the most extensive cycle index, as it extends from 1790 to the 
present. Because it was increasingly difficult to find an adequate num- 
ber of satisfactory series for the earlier years, it was necessary to splice 
together several sets of annual series over the span covered. 

(1) From 1790 to 1855, 10 series were used. 

(2) From 1855 to 1901, 10 different series were used. 

(3) From 1901 to 1919, the Persons-Day-Thomas index of manufac- 
turing production, with mineral production added, was used. 

(4) From 1919 to date the Federal Reserve index of industrial produc- 
tion was used. 

Each series was reduced to a per capita basis and adjusted for trend. 
Extensive use was made of the high-low mid-point method of computing 
trend (see pp. 412-418). 

The method of obtaining weights is imusual. Each of the 10 series 
in the first set was extended through 1882, and each of the 10 series in the 
second set was extended through 1930. Then each of the earlier series 
was compared with the index numbers of the latter during the period of 
overlapping. The weights assigned to the different series in the earlier 

For further details, see W. C. Mitchell, Business Cycles^ the Problem audits Setting, 
pp. 295. 32S> National Bureau of Economic Research, New York, 1927. 
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period were iii proportion to their closeness of correspondence to the move- 
ments of the index numbers. (Technically, the weights assigned were in 
proportion to the coejEcients of correlation which were computed. See 
Chapter XXII for an explanation of correlation.) Before the different 
series were averaged together, they were put in terms of their average 
deviations. 

Greater difficulty still was occasioned in obtaining monthly data; from 
1790 to 1815, for instance, it was necessary to rely entirely on commodity 
prices. The different monthly series selected were fitted to the annual 
index numbers so as to make the average monthly values of the former 
coincide with the annual index numbers. Large silhouette charts (such 
as Chart 24), as well as the monthly index numbers, which are deviations 
from normal, are published by the Cleveland Trust Company, of Cleveland 
Ohio, and are available upon request. A description of the index by its 
author, Col. Leonard P. Ayres, is also on the large chart. 

Indexes of Qualitative Changes or Differences 

Adequacy of State Care of Mental Patients. Such an index comparing 
the different states was constructed for the years 1922 and 1933 by Ellen 
Winston and published in the American Sociological Review of April 1938.^^ 

The index is a simple average of five sets of relatives: 

(1) Nurses and attendants per 1,000 average daily resident patient 
population (125 nurses and attendants per 1,000 patients == 100). 

(2) Physicians per average daily resident patient population (6.67 
physicians per 1,000 patients = 100). 

(3) Physicians per annual admissions (25 physicians per 1,000 admis- 
sions = 100). 

(4) Annual cost of maintenance per average daily resident patient 
population ($312 = 100). 

(5) Value of hospital property per average daily resident patient popu- 
lation ($1,500 = 100). 

Two group indexes were also computed, the first three series being aver- 
aged together to obtain an index of personnel in state hospitals, and the 
last two to obtain an index of expenditures of state hospitals. These two 
group indexes corresponded closely with each other. 

Since the same standard or base was used for 1922 and 1933, it is pos- 
sible not only to compare the different states for the same year, but also 
to compare the adequacy of a given staters care in the two years. No 
index number, however, is computed for the United States as a whole. 

^s^Tndices of Adequacy of State Care of Mental Patients,” by Ellen Winstor 
American Sociological Renew, Vol 3, April 1938, pp. 190-202. 
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The fairness of the comparisons between states and between periods of 
time is somewhat impaired by the fact that no adjustment is made in the 
two financial series for variations in the value of the dollar between states 
or between years. 

Measures of adequacy of state school systems. All but one of the 
preceding illustrations have dealt with measurements of primary interest 
to economists, and for the most part have dealt with comparisons over a 
period of time. This concluding section on index numbers will have to 
do with school systems, and will describe some index numbers which com- 
pare the adequacy of such systems in the different states. Although no 
very satisfactory measure has yet been devised, a number have been un- 
dertaken, and the following pages will illustrate some interesting varia- 
tions in procedure. 

Ayres^ Index of State School Systems, Probably the first comprehensive 
index of school systems was undertaken by Leonard P Ayres in 1912. A 
revision was made in 1920.^^ In the revised index ten series of data were 
averaged together for each state, five of the items having to do with at- 
tendance and five wdth financial matters. These items and the multipliers 
used to reduce them to a comparable basis are: 


1 

2 

3 

4. 

5 . 

6 . 

7 . 

8 . 
9 . 

10 . 


Measure of Adequacy Multiyher 

Per cent of school population attending public schools daily 1 

Average number of days attended by each child of school age .... i 

Average number of days public schools were kept open i 

Per cent that high school enrollment is of total enrollment (2.75 multi- 

pher in states with 11-year systems) 3 

Per cent that boys are of girls in high schools 1 

Average annual expenditure per child attending 1 

Average annual expenditure for each child of school age 1 

Average annual expenditure per teacher employed -sh 

Expenditure per pupil for purposes other than salaries 2 

Expenditure per teacher for salaries ih 


The object of these multiplications or divisions was to express each series 
for each state as a per cent of some standard which was considered desir- 
able, Thus 200 days was considered the standard length of school year; 
consequently if a school in a particular state kept open 200 days, that 
state would have a rating of 100 wdth respect to item 3. It was possible, 
and it occasionally happened, that a state exceeded the standard set for a 
particular item and therefore rated higher than 100. The index numbers 
were simple arithmetic averages of these relatives. Index numbers were 


See Russell Sage Foundation, Circular No 124, A Comparative Study of Public 
School Education in the 48 States; and Leonard P. Ayres, An Index Numher for State 
School Systems^ Russell Sage Foundatiout New York, 1920. 



646 


INDEX NUMBER THEORY AND PRACTICE [Chap, 2i 


computed for the United States as a whole for each year from 1871 through 
1918, and for each state for the years 1890, 1900, 1910, 1916, and 1918. 
Since the index has been brought up to date by Frank M. Phillips, index 
numbers for each state are now available for the additional years 1920 
and 1922. 

In addition to bringing the Ayres index up to date, Phillips made a 
revision employing an additional technique for the years 1910, 1918, 1920, 
1922, 1924, and 1930. This c^^nsisted in adjusting the five financial series 
for changes in cost of living. Each series was deflated by dividing by the 
United States Bureau of Labor Statistics Cost of Living Index (for 1910, 
the Retail Food Price Index was used). This adjustment was for the 
purpose of making the index numbers of state school systems more com- 
parable over time, so that an increase in money expenditures would pro- 
duce an increase in the index number only to the extent that the former 
meant an increase in real expenditures also. The effect of the adjustment 
has been that, both for the United States as a whole and for the different 
states, the school systems have shown considerable less improvement over 
time. Thus for the United States as a whole we have the following results : 


Year 

Original Index 

Revised Index 

1910 . 

42.41 

43 55 

1918 . . 

51 01 

44 34 

1920 . . 

59.42 

44.73 

1922 . 

74.50 

57 15 


The revised index is designed to show two-way comparisons: among 
states, and over time. The comparisons among states are not completely 
valid, however. Although the financial items have been adjusted for 
changes in the cost of living over time, they have not been adjusted for 
differences in cost of living among states. It is also true that costs of 
living vary directly with density of population and degree of urbanization 
Phillips took cognizance of the density and urbanization factors by publish- 
ing a supplementary table in which the states are classified into groups on 
these bases and the rank of the different states in each group is shown.^^ 

Phillips’ Index of Educational Rank Phillips also computea a new 
index based upon the following items: 

(1) Percentage of illiterates in the population ten years of age and over, 

(2) Ratio of the number of children in average daily attendance in 
public schools to the number of age 5 to 17 inclusive. 

(3) Per cent that high school enrollment is of total enrollment. 

20 See Frank M. Phillips, Educational Ranking of States by Two MethodSj Bruce Pub- 
lishing Company, Milwaukee, 1925; and ‘^Educational Rank of States, 1930,” The 
Ainencan School Board Journal^ February, March, and April, 1932, VoL 84, Nos. 2, 3, 4. 
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(4) Average number of days attended by each child enrolled. 

(5) Average number of days schools were kept open. 

(6) Ratio of the number of students taking teacher-training courses to 
the nxunber of teaching positions. (For the 1930 index a different series 
was substituted.) 

(7) Per cent of high school graduates contmuing their education. 

(8) Total cost, excluding salaries, per pupil in average daily attendance. 

(9) Average annual salary of teachers, principals, and supervisors 

(10) Total amount expended per child of school age. 

It is seen that Phillips retains a number of Ayres^ series. He adds a 
series on illiteracy, one on teacher training, and one on higher education. 
On the other hand, he reduces the number of financial series to three. 
These are not adjusted for changes in the cost of living, since the Phillips 
index is designed to make comparisons among states for a given year only. 
The method of constructing the index is different also. The states are 
first ranked separately with respect to each given criterion; then the ranks 
are summed, state by state; finally the states are ranked according to the 
sum of their ranks. For instance, in 1930 Washington ranked as follows 
with respect to the ten different criteria: 3, 4, 1, 28, 13, 7, 20, 18, 14, 11. 
These ranks total 119. Since this was the lowest total, Washington was 
first in educational rank among the states. 

Item 6, the ratio of teacher-training students to teaching positions, was 
discarded in 1930, since a large ratio may in practice sometimes mean 
that the teaching field is being overcrowded by certifying poorly qualified 
teachers. In place of this item there was substituted, where obtainable, 
the per cent of teachers employed who had at least two years’ training 
beyond high school graduation; where these data could not be obtained, 
the ratio of teachers to students was used. Item 7, the per cent of high 
school graduates contmuing their education, is not very satisfactory either. 
In general, a state that ranks high with respect to this criterion ranks low 
with respect to the others. In technical language, there is negative corre- 
lation between this criterion and each of the others. (See Chapter XXII.) 
The illiteracy series is still another which is not considered appropriate by 
some authorities. 

Rankings are available for the sum of ranks for 1910, 1918, 1920, 1922, 
1924, and 1930, and for each criterion for most of these years. 

iV.jS.A. Ranking. 'The Research Division of the National Education 
Association has selected five criteria for judging the adequacy of school 
systems. Only those series have been selected that are widely accepted 

21 Ihid.; also ^^Educational Ranking of States by Two Methods/' by Frank M 
Phillips, American School Board Journal, Vol. 69, December 1924, pp. 47-49. 
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as valid, whose validity is supported by the data available, that are rea- 
sonably reliable, and for which comparable nation-wide figures are avail- 
able. No attempt has been made to combine the different series into an 
index, since information is not available for their proper weighting. The 
five series selected were : 

(1) The proportion of children reached by the services of the schools, 
measured by the ratio of actual number of student days in school to the 
standard number of student days considered desirable 

(2) The holding power of schools, measured by the ratio of children 
aged 14-17 (high school age) attending school to the number of children 
aged 14-17. 

(3) The quality of teaching provided, measured by the average salary 
paid teachers, principals, and supervisors. 

(4) The material school environment, measured by the value of school 
property per child enrolled. 

(5) The per cent of literacy among native-born population over 10 
years of age. 

The states are ranked for the year 1930 with respect to each of these 
criteria; the table of rankings was published in the Natwnal EducaUon 
Association Research Bulletin, May 1932 (p. 126) 

N.E.A . Index of Financial Adequacy, Confronted with the question of 
whether federal aid to states for education was desirable, the Research 
Division of the National Education Association undertook to construct 
measures of effort expended by states for education and the adequacy of 
the results obtained. The Association abandoned any attempt to measure 
adequacy by combining separate measures of the type we have been de- 
scnbing, and substituted instead the following ratio : 

Amount spent for education 
Units of educational need 

The numerator of this fraction is: total expenditures for current ex- 
penses, exclusive of interest (which is not available for ‘all states), plus 
cost of state department of education, less amount of expenditures re- 
ceived from the Federal Government and from subsidies. The denomina- 
tor is Mortis Index of Educational Need.^^ 

The method followed by Mort in constructing his index is very in- 
genious. The unit of educational need is one student attending elementary 
schools daily. It is recognized, however, that it is more expensive to 
support a school system m a rural community than it is in an urban com- 

^2 An Objective Basis for the Distnbutiop of Federal Support to Public Education,” 
by Paul P. Mort, Teachers College Record, Vol. XXXVI, November 1934, pp. 91-110. 
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munity, and more expensive to support the education of a high school 
student than that of an elementary school student. Consequently certain 
adjustments must be made. First, the educational need attributable to a 
high school student is considered 1.7 times that of an elementary student. 
The adjustment for additional need attributable to residence in a rural 
territory is more complicated. A community is considered rural territory 
if it has 2,500 population or less As a measure of degree of rurality for a 
given state, the ratio of number of acres of farm land per inhabitant 5 to 
20 years of age living in rural territory is computed. Let us call this the 
rurality ratio. The correction factor for additional need due to rurality is 
obtained by taking the sum of 1,22 plus four-thousandths times the rurality 
ratio (provided, however, that the correction factor shall not exceed 1.70). 
This may be expressed as the following equation, in which 7^ is the cor- 
rection factor, for a state, and X is the rurality ratio : 

Yc = 1.22 + .004Z, 

with the maximum Yc value of 1.70. It is evident that a high school 
student in* daily attendance in a rural territory in a state with the maximum 
rurality ratio constitutes an educational need of 1.7 X 1.70 = 2.89, com- 
pared with a value of 1.00 for an elementary student in an urban territory 
in a state with a minimum rurality ratio. 

An additional correction is made in the Mort Index of Educational Need 
for variation in cost of living as between communities of different size. 
The correction for cost of living varies from 30 per cent for communities 
of more than 500,000 population to no correction for communities of less 
than 10,000 population. This correction is not entirely satisfactory since 
it does not allow for variation among different sections of the United 
States. The state index numbers of educational need are deflated in the 
usual fashion by the cost of living index. 

The index of financial adequacy obtained by dividing the amount spent 
for education by Mortis index is especially suitable for the purpose for 
which the index is intended. If it can be shown that the financial ade- 
quacy of a state system bears little relationship to the amount of financial 
effort or sacrifice which a state makes, that constitutes a good talking 
point in favor of Federal aid to states for education. Effort is defined as 
^The extent to which a state extends itself to support education in terms 
of its financial ability,” and is measured by the ratio: 

Amount spent for education 

Financial resources 

The measurement of financial resources itself constitutes an interesting 
problem in index number construction, but will not be discussed in this 
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volume. It is the conclusion of the National Education Association that 
these measures constitute a valid argument for federal aid.^^ 

Sources of Current Index Numbers 

A considerable proportion of the most useful economic indexes that are 
published currently can be found in one or more of the following period- 
icals : 

(1) Survey of Current Budness, published by the United States De- 
partment of Commerce, Bureau of Foreign and Domestic Commerce. 
Back data are available in the different annual supplements. 

(2) Federal Reserve Bulletin^ published by the Board of Governors of 
the Federal Reserve System. 

(3) Standard Trade and Securities Basic Statistics, Volume 3, Statistical 
Section, and Current Statistics; published by Standard Statistics Company, 
Inc. 

Other sources of statistical data, many of which pubhsh index numbers, 
will be found in Appendix A. For a comprehensive list of indexes, to- 
gether with brief description and sources, the reader is referred to Donald 
H. Davenport and Frances V, Scott, An Index to Business Indexes, Busi- 
ness Publications, Inc., Chicago, 1937. 
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23 See ^'The Efforts of the States to Support Education,” in National Education Asso~ 
cioMon Research Bulletin, VoL XIV, May 1936, pp. 103-163. The issue was prepared 
by Lyle W Ashby. 
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One of the chief objectives of science is to estimate values of one factor 
by reference to the values of an associated factor. ^The scientific method 
. . . consists in the careful and laborious classification of facts, in the 
comparison of their relationship and sequences, and jfinally in the discov- 
ery by the aid of disciplined imagination of a brief statement or Jormulaj 
which in a few words resumes a wide range of facts. Such a formula . . . 
is termed a scientific law.^^^ When the relationship is of a quantitative 
nature, the appropriate statistical tool for discovering and measuring the 
relationship and expressing it in a brief formula is known as correlation. 

A Simple Explanation 

It may surprise some of us to know that there is a very close relation- 
ship between temperature and the frequency with which crickets chirp. 
If, for instance, we should count the number of chirps made by a cricket 
in 15 seconds and add it to 37, we could closely approximate the Fahrenheit 
temperature at that time. Or, if we should multiply the degrees Fahren- 
heit by 3.78 and subtract 137 from the result, we could estimate the num- 
ber of chirps to be expected from a cricket in one minute. This relation- 
ship would be found remarkably accurate, unless the temperature was 
below 45°. When the weather is colder than 45°, crickets do not chirp. 
Likewise, it might not be accurate appreciably beyond 80° since observa- 
tions have not been made beyond that temperature and we do not know, 
therefore, if the relationship holds for higher temperatures. 

The relationship between these two variables — temperature and cricket 
chirps — is displayed in Chart 215, known as a scatter diagram. Each dot 
represents an observation of one cricket. Thus observation A represents 
a cricket which, at a temperature of 59.0®, chirped 85 times per minute. 
The reader should notice that temperature is plotted along the X-axis, 


^ Karl Pearson, The Grammar of Science^ p 77. Adam and Cliarles Black, London, 
1900. 
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while chirps per minute are plotted along the F-axis. This is because 
the number of chirps per minute appears to be a direct result of the tem- 
perature. In this case it is also true that we wish to estimate the number 
of chirps to be expected at a given temperature. Temperature is therefore 
the independent variable, and chirps per minute the dependent variable. 
Even though it were temperature we wished to estimate, it would never- 
theless be best to show the causal factor on the X-axis. When the causal 
relationship is not clear or when neither factor can be said to be the cause 


CHIRPS 
PER MINUTE 



Chart 215. Temperature and Chirps per Minute of IIS Crickets. (Data provided by 

Mr. Bert. E. Holmes.) 

of the other, then the variable to be predicted should be plotted on the 
F-axis. 

Judging from Chart 215, we see that the relationship between the two 
variables is linear, for the straight line appears to be as good a fit as a 
more complicated curve. The equation of this line^ is 

Yc = -137.22 + 3.777Z. 


^ This equation was fitted by the authors to data furnished by Bert E. gee 

also Bert E. Holmes, "Vocal Thermometers,” The Bdentific Monthly, Vol. XXV. Seo- 
tembor 1927, pp. 261-264. 
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From this equation, estimates of chirps can be made for any desired tem- 
perature mthin the limits of the observations shown on the chart. Thus, 
if we wish to estimate the nximber of chirps when the temperature is 59.0° 
(observation A), we find the number by substituting 59.0 for X in the 
equation. Thus 

Yc = -137.22 + (3.777) (59.0) = 86 chirps. 

The estimate could be read, although less accurately, directly from the 
estimating line plotted on the chart. Although the estimate (86) does not 
agree perfectly with the actual observation of 85 chirps, the discrepancy 
is not large. 

We cannot fail to be impressed with the adequacy of the generalization 
expressed in the equation Yc = —137.22 + 3.777X. Since most of the 
dots are very close to the line, it appears that frequency of chirps has 
been adequately explained by reference to temperature. The slight va- 
riations from the estimating line are unexplained and may be due to 
differences between individual crickets, differences associated with the 
time of day or year m which the observations were made, humidity, 
and inaccuraries of observation of temperature or number of chirps. Also, 
the temperature at the spot where the cricket is chirping may be different 
From that where the observer is standing. This might be the case if the 
cricket were under a stone. An examination of other causes of variation, 
(n addition to temperature, involves consideration of three or more vari- 
ables, a procedure for which will be considered in Chapter XXIV under 
the heading of multiple correlation. 

The closeness of the relationship may be expressed in general terms by 
stating that the coefficient of correlation, r, is +.9919. Since =^=1.0 is per- 
fect correlation and 0 is no correlation, we can readily imagine that one 
almost never finds a higher coefficient than +.9919. The plus sign indi- 
cates that the correlation is positive — ^that is, that the chirps increase as 
the temperature increases. Had chirps decreased with increasing tem- 
perature, the correlation would have been negative, or inver&e; the sign 
of r would have been negative, as would the sign of h in the estimating 
equation; and the estimating line would have sloped downward to the 
right. 

An illustration of rather low correlation (—.11) is given by Chart 216. 
In this case, brain weight was estimated by cranial capacity, and legisla- 
tive ability by a rather complicated system of scoring. But even if we 
assume that all measurements are accurate, the evidence certainly does 
not suggest that legislators should be selected solely from head measure- 
ments. Perhaps there are additional factors which account for legislative 
ability; for example, intelligence, education, initiative, honesty, social 
awareness, ^nd other traits are doubtless important. 
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Correlation Theory 

Correlation may be thought of as involving three types of measure- 
ments, which may conveniently be made in the following order: 

(1) An estimating equation which describes the functional relationship 
between the two variables. As the name indicates, one object of such an 
equation is to make estimates of one variable from another. 


LEGISLATIVE 

ABILITY 



Chart 216. Estimates of Brain Weight and Legislative Ability of 89 Members of 
Congress. (Data from “Brain Weight and Legislative Ability m Congress/^ by Arthur 
MacDonald, Congressional Record^ April 12, 1932 ) 

(2) A measure of the amount of variation of the actual values of the 
dependent variable from their estimated or computed values. This meas- 
ure of the variation which has not been explained by the estimating equa- 
tion is analogous to a standard deviation and gives an idea, in absolute 
terms, of the dependability of estimates. It is called the scatter, or standard 
error of estimate 

(3) A measure of the degree of relationship, or correlation (r), between 
the variables, independent of the units or terms in which they were orig- 
inally expressed. A closely related measure (r^) will permit us to state 
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the relative amount of variation which has been explained by the estimat- 
ing equation. 

The estimating equation. Foresters sometimes find it convenient to 
estimate the height growth of trees from their growth in diameter, since 
this procedure is quicker than direct measurements of the growth in height, 
The scatter diagram, Chart 217, shows the breast-high diameter growth 
and the growth in height of 20 trees, together with the estimating line 
which describes the nature of the relationship between the two variables. 
This straight line has been so fitted that the sum of the squares of the 
Y deviations from it is less than those from any other straight line. A 

HEIGHT growth 
IN FEET 



DIAMETER GROWTH 
IN INCHES 

Chart 217. Breast-High Diameter Growth and Height Growth of 20 Forest Trees. 

(Data of Table 157.) 

curve fitted in this manner is usually considered by statisticians to be the 
best with which to estimate values of one variable when values of the 
other variable are known. The fitting of such a line is similar to the 
fitting of a trend and requires the use of the following normal equations;^ 
L Sr = Na + 62Z. 

II. SZF = aSX + 6SZ2. 

Table 157 shows the computations that are necessary to determine the 
values which must be substituted. The substitution yields. 

I. 173 - 20a + 90.76. 

II. 856.0 = 90.7a + 453.936. 

® The normal equations were discussed in Chapter XV* 
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Multiplication of all the items in equation I by 4.535 pennits us to cancel 
out a by subtracting equation I from equation II. Thus 

I X 4.535. 784.555 = 90.7a + 411.32456 
II. 856 0 = 90.7a + 453.936 

71.445 = 42.60556 

6 = 1.677. 

We may now substitute the value of 6 in equation I in order to find the 
value of a. 

I. 173 = 20a + 152.1039 
a = 1.045. 


TABLE 157 

Computation op Values Used in Computing Estimating Equation for Growth 
IN Diameter and Height of 20 Forest Trees 


Rank in 
diameter 
growth 
(smallest 
to largest) 

Diameter 
growth at 
breast height 
in inches 

X 

Height 

growth 

in 

feet 

Y 

XY 

X2 

1 

2.3 

7 

161 

5 29 

2 

2.5 

8 

20.0 

6.25 

3 

26 

4 

10 4 

6.76 

4 

3.1 

4 

12 4 

9.61 

5 

3.4 

6 

20 4 

11 56 

6 

37 

6 

22.2 

13 69 

7 

39 

12 

46 8 

15.21 

8 

4.0 

8 

32 0 j 

16 00 

9 

i 4,1 

5 

20 5 

16 81 

10 

4.1 

7 

28.7 

16 81 

11 

' 4.2 

8 

33 6 

17.64 

12 

4.4 

7 

30 8 

19 36 

13 

47 

9 

42.3 

22.09 

14 

5 1 

10 

510 

26.01 

15 

5.5 

13 

715 

30.25 

16 

5.8 

7 

40 6 

33.64 

17 

6.2 

11 

68 2 

38 44 

18 

6.9 

11 

75.9 

47 61 

19 

6.9 

16 

i 110 4 

47.61 

20 ! 

7.3 

14 

102 2 

j 53.29 

Total 

90.7 

173 

856 0 

453.93 


Source Bcnald Bruce and F X. Schumacher, For&tt Mensuration^ p 124, McGraw-Hjll, New York, 
1935. Courtesy of Publisher and Authors, 


The values for a and h are checked by substitution in equation II. While 
this does not prove that no errors in computation have been made, yet 
if the correct numbers were substituted in the two normal equations, either 
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no errors, or counterbalancing errors, have been naade. Since a = 1.045 
and h ~ 1.677, the equation of the line which enables us to estimate the 
growth in height of trees in this particular forest when their growth in 
diameter is known may be stated as 

Yc = 1.045 + 1.677X. 

Suppose now we wish to estimate the height growth of a tree which 
grew 5.5 inches in diameter. Substituting in the equation, we have 

Yc = 1.045 + (1.677)(5.5) 

= 10.268 feet. 


Dependability of estimates. However, we should not expect aU trees 
which grew 5.5 inches in diameter to have grown exactly 10.268 feet in 
height, for the dots of the scatter diagram do not all lie on the fitted line. 
Rather, 10.268 should be thought of as an estimate of the average height 
growth of all trees of the diameter growth indicated. We should expect 
variations from this value the same as from the arithmetic mean of a 
frequency distribution. It is therefore pertinent to inquire what propor- 
tion of trees may be expected to fall within any range of error in which 
we may be interested, assuming, of course, that we have a representative 
sample. 

To do this, it is necessary to compute the standard deviation of the F 
values, not from their mean, but from the line of estimation. On Chart 
217 the vertical distance from the line of estimate to any Y value repre- 
sents the difference between the observed Y value and the estimated Y 
value. The estimated Y values, Fc, are obtamed by solving the estimat- 
ing equation for each measurement of diameter growth, or X value. The 
deviation Y — Yc represents the error that would have been made in one 
particular instance. To obtain a summary measure of those deviations, 
they may be squared, summed, and divided by N, and the square root ex- 
tracted. This is the scatter, or standard error of estimate and may be de- 
noted^ by cTy^ or Sy. Its formula may be written 


or 




jX(Y-Ya)^ 

i N • 


In this illustration cry 


V 


88.75 

20 


= ^/4:A4: = 2.107, Calculations are 


^ The symbol is frequently used for this concept. Although this measure is fre- 
quently spoken of as a ^‘standard error of estimate,’' it is not a standard error in the 
sense used in Chapters XXI and XIII. cfy^ is the standard error of an individual item 
when the values are measured as deviations from the estimated values (Fc) in the same 
sense that cry is the standard error of an individual item when the values are measured 
as deviations from their mean (F). Consequently it seems more logical to use the 
83rmbol (Ty^ than Sy, 
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shown in Table 158j columns 7 and 10. Ordinarily the more expeditious 
method of calculation which is explained on page 671 would be used. The 
above method is used solely to explain the meaning of the measure. 

This measure may be interpreted in a manner strictly analogous to that 
of the standard deviation of a frequency distribution. Although is 
called the standard error of estimate, it is not to be thought of as a measure 
of the variation of estimates made from different samples, but merely as 
a general measure of the variation of the actual Y values from the com- 
puted Y values of a particular sample. It yields an estimate of the range 
above and below the line of estimation within which 68.27 per cent of the 

HEIGHT GROWTH 
IN FEET 



DIAMETER GROWTH 
IN INCHES 

Qiart 218. Estimating Line and Zones of Scatter for Diameter Growth and Height 
Growth of 20 Forest Trees. (Data of Table 158.) 

items may be expected to fall if the scatter is normal. In practice we 
frequently think of this measure as the range within which about f of the 
values will be found. For the case in hand - 2.11), we may expect 
to find about | of the items of Chart 218 within the narrow band 
shown in the diagram; about 95 per cent (ideally 95.45) within the wider 
band that includes ^2cry/, and practically all within (theoretically, 

with a large number of items, 99.73 per cent of the cases). A count ol 
the dots shows that within =t=cry^ of the line of estimate, 13 of the 20 items 
(65 per cent) are found; within ^2cry^ of the line, 19 of the items (95 per 
cent) appear; and within are included all 20 of 'the items. The slight 

discr^nancies may have been due to the fact that the sample was small 




TABLE m 

CoMPirrA'riON of Total Variance, Explained Variance, and Unexplained Variance, for Height Growth of 20 Forest Trees as 

Estimated by Their Diameter Growth 


COOO«0»^C<lOC<»C^l>OOeOCDOOO'^iO»0'«!*i 
1— lOO'rtl»OQOC<lCNl»HC<iOOl>l>*^iHCOOOlOrHrHOO 
-r'--0<NO<OCOCNC5r«-liOO 
r OOOTHrtKMrHCOTtiLO 

riHt^>H»ooT-ia>ooooocsiooi>»^oc<»THO 




c3 

p 

& 

DQ 


1^ 


»Ot^OOO'!ttOCSlcOTjH'«^£^^l>t>a50>COC<l(NOO 
t>rHOOOrHO'^'rtlt-Hi-<TtlTHCOOOr>*(NiPlO:>05r-H 
■^THe000C^C0C0O00C0i-Ht0l>05i— lOCSSfN (NO 
oou:)^-.<©c5iT-^co^^Cl>0(^^oocx)cc'»o^>i^•^>»o 

TflrHOlLOCOTHTHOOOOOOO(r<l'^t^‘OlOr- 1 

tH T— I rH ^ ^ Cisq 


i^. 

1 g 


CNC^(NP1CNCN(N(N(N(NC?<N(NCN(M(N(r^(NN(rv| 
(NJM(NCN(NCN(M(N<M(N(NOqCslCNCN(N<NC<ICNCM 
I>T^^C0CC)OO<^^TH(^5^^T^l^^-T-^00C5t^»-0lOOO 
(NOrHi— lb-I>rHOCO(NO<NOr-l(X)lNtOu5’-iHCO 
CN tH tH r-H iO CQI 


o 


Oi 

iO 

CO , 

(N 

to 

TtH ' 

(N 

00 

o ^ 

CO 

o 

(N 

rH j 



ooc^^lP'!f^>o»ol>•THT-^oo•r}^(^:)(^l(^^(N(:^^cD'^(^:) 

O5COO'^'^»PT-^ri^(N(^^C>0<^^N-O(^0^-•Tt^rHly)rH 

Ol>-Tt^{Nlt>.C<l^(NCi(p5O'^O'Tt^l>-^cOC0I>‘ 

C<JC<jT-ic<10T--(ri^O(NOOi-HOCCSl(^:OrHCOO 

I I I I I I I I Mi 


1>S 


00<^^lO<:C(^DOiOt^a>C5(NCO^-OOCO(NC^^OCD^>- 

^TH-^OOOCOC55(N(?<ICO(Nl>.Ttlr-H(N05COCD(rO 

^>'!i^^^^'^C5THO00t>•J>.^L0(NC<JCPO1H^-C5CPO 

COCOcrOCN^rHT--iOOOOOOO^?‘3C<ICOirO'rH 

If I M i M II I I 


1!^ 

I § 


LO Lo lo VO k:» »o lo iP o Lo lo lo o xo lo »o lo o Lo 
cococooocoxroocooocococococcx^scococo 
rHO'5i<THCN(N(>5OC0rHOT-lOrH'«:iHr-(CQC<lN*-P 

li I I I I I I 1 I I I 


CNOOvO'^t>.OiLOCOTHTH(XDTt)l>-COCO(N<N001>. 

O00O'^'^v0(X)‘OCq(N|(X)(NC<l0iC0t'-'^T-HTHG0 

03(NTlHCSIt^(N»01>Oa050'^0»OC<H>'^COCDOq 

THvoxoocd^>^^^-l^-l>(^Oo6c)6oooTHcs^^^lco 


M febS 


L-COTl^^OCO<NICOVOI>.C)0)t>OOC01>-rHT»HO'^ 


03 S 

H W) — • 


COvOOOT-l-^ij^P-asOTHTHC^xHt- ^XOOOOJOCIC^ 

C<l (N oo (^D CO CO tjh lO »0 lO OP 'O: O 1“^ 




4^ 43 03 

||||3|S 


1-^C<^(^0'^l0O^»(X)C»O^H^C^Jc0)T^Vl0c0^•C005O 

1— t ^ rH 1— I "H rH tH t— 1 rH CS 


m 


Source: Bata of Table 157. 
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'and the scatter not normally distributed around the estimating equation. 

It was calculated that trees with growth in diameter of 5.5 inches should 
average 10.268 feet in height growth. We may now amplify the state- 
ment by saying that, if our sample is representative, about f of such trees 
should vary in height growth between 8.16 feet and 12.38 feet (10.268 ^ 
2.107); or, considering a slightly wider range, about 95 out of 100 should 
lie between 6.05 feet and 14.48 feet. The proportion lying within any 
other range could readily be computed also by referring to Appendix E. 

These statements concerning range of error have to do, not with cer- 
tainty, but only with expectation. We have used only 20 items, and, 
even though the sample may have been carefully chosen, another sample 
of 20 would not give us precisely the same results as those obtained above. 
It might be that we could reduce uncertainty further, not only by in- 
creasing the size of our sample but also by comparing variations in height 
growth with some other factor in addition to diameter growth — for ex- 
ample, age, since as trees grow older their rate of growth may change. 
Also, the character and quantity of plant food in the soil and the degree 
of crowding of the trees might be considered. Even if several factors in 
addition to diameter growth were considered, there would still be some 
unexplained variations, and therefore stiU some uncertainty. 

The correlation coefficient and explained variability. Another measure 
closely related to the estimating equation and scatter, and frequently used 
in the social sciences is the coefficient of correlation r. The estimating equa- 
tion Yc - Ci + hX is B. statement of the way in which the dependent variable 
changes with variations in the independent variable. CTy^ is an indica- 
tion of the amount of dispersion in the dependent variable which we 
have failed to account for by our line of estimation, but it is stated in 
terms of the original data — in the case of the diameter growth and height 
growth data, in feet. When stating the degree of relationship between 
two variables, it is convenient to be able to state results in concise numerical 
terms which are independent of the units of the original data, and to ex- 
press the degree of relationship between two series even though we do 
not know the equation of the line of estimation or cTy^. To be sure, some- 
thing is lost by so compressing the information, since it does not enable 
us to make an estimate of the value of one variable from the other, or 
to tell, in absolute magnitude, the degree of accuracy of any prediction 
we may make. But something is gained too, since one coefficient can be 
compared with any other, regardless of the subject matter of the different 
correlations. As has been stated, the coefficient of correlation is a number 
varying from -f 1, through zero, to —1. The sign indicates whether the 
slope of the line of relationship is positive or negative, while the magnitude 
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of the coefficient indicates the degree of association. When there is abso* 
lutely no relationship between the variables, r is 0. 

A clear understanding of the meaning of the coefficient of correlation 
is given by the following approach. One measure of variability, called vari' 
ance or total variance^ is the square of the standard deviation of the Y 
values. This total variance can be broken up into two parts : that which 
has been explained by our line of relationship, and that which we have 
failed to explain. The total variance in height growth of the trees of our 
distribution, as indicated by the calculations in column 8 of Table 158, 
is (3.229)2, or 10.43. The amount of variability which we have explained 
by our line of relationship may likewise be measured by the square of 
another standard deviation, that of the estimated Y values from their 
own mean (which is also the mean of the original Y values)^. The explained 
variance is shown in column 9 of Table 158 to be (2.448)2 = 5.99. The 
unexplained variance is the square of the standard error of estimate. But 
this measure has already been found to be 2.107; hence the unexplained 
variance is (2.107)2 = 4.44. 

Let us summarize our findings : 


Variance 

Symbol and formvki 

Amount 
of variance 

Per cent of 
total variance 

Unexplained 

JV 

|S(F -- 

444 

42.6 

Explained 

li 

_^rc 

5.99 

57.4 

Total 

II 

b 

S(F - F)2 

N 

10.43 

100 0 


It may be helpful to some readers also to visualize this information 
Chart 219 shows for the data of height growth: 

A. The derivation of total variance, which is based upon the deviations 
of the actual Y values from their mean. 

B. The derivation of explained variance, which is based upon the de^ 
viations of the computed F values from their mean, (Note that 

' 7c = F.) 

C. The derivation of unexplained variance, which is based upon the 
deviations of the actual F values from the computed F values. 

The values of each standard deviation and variance are shown at the 
light. The variances are indicated by shaded squares, and it is to be 
observed that the sum of the areas of the two lower squares equals that 


® See Appendix B, section XXlI-l, equation 2. 




DIAMETER GROWTH IN INCHES 
A TOTAL VARIANCE 



DIAMETER GROWTH IN INCHES 
C UNEXPLAINED VARIANCE 


Chart 219. Derivation of Variance of Height Growth of 20 Forest Trees as Explained 
by Their Diameter Growth. (Data of Table 158.) 
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of the upper square — that is, total variance is the sum of the explained 
variance and unexplained variance, or 

On the other hand, the standard deviation of the original data is smaller 
than the sum of the standard deviation of the computed values and the 
standard error of estimate. The relationship is clearly seen if we think 
of the standard deviation as being the hypotenuse of a right triangle and 



Chart 220, Diagrammatic Representation of Relationship between Standard Devia- 
tions and Variances. (Data of Table 158.) 

the other two standard deviations as being the other two sides, as in 
Chart 220.^ 

The coefficient of determination^ r^, is the proportion of total variance 
which has been explained (.574 in this case). The coeffikient of correlation^ 


®See Geometric Presentation of Correlation,” by John W. Morse, Journal of the 
American Statistical Associationj Vol. 32, June 1937, pp. 364r~365. For proof that 

of = 0-?(. + O'fs 

see Appendix B, section XXII-1, equation 9. 

^ The proportion of variance which has not been explained is sometimes called the 
eoejQicient of non-determination, k^. From the table on p. 661, it can be seen that 
7 ^ q. ^3 =5 Just as the square root of the coefficient of determination is the coeffi- 
cient of correlation, so the square root of the coefficient of non-determination is known 
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r, is the square root of the coefficient of determination.^ Thus the coeffi- 
cient of correlation (+ 758) may be thought of as the square root of 
the proportion of variance that has been explained ^ r will, of course, 


as the coefficient of alienation But, although _|_ ^2 ^ 1, r + ^ > 1 See Mordecai 
Ezekiel, Methods of Correlation Analysis, pp 376-377, John Wiley and Sons, New York, 
1930 

® Therefore r is also the ratio of the standard deviation of the computed values tc* 
the standard deviation of the original data; that is, r =« — 

A further simplification of the formula for r (of which use will be made in chapters 
to follow) IS to regard r as the square root of the proportion of variation (sum of squared 
deviations) that has been explained The development is as follows: 

cl = c\ + <7?^, 

2(7 - F)^ 2, 7c - F)^ 2(7 - 7c)^ 

N ~ N iV • 

2(7 - 7)2 = 2(7c - F)2 + 2(7 - 7c)2. 


Therefore r may be obtained from 


-V 


2 7c - 7)2 




Z(Y - Yc)^ 
- YY' 2:(F - YY' 

Since Y — T ^ y; Y -- Yc - ys', Yc — Y = yc, we may write 
^ 2/2 = St/? -b 'Lyl, and 


r = 



M 

S2/2 


Occasionally a method of curve fitting is employed which results in lack of equality? 
between 'Zy^ and "Zyc + S 2 /I For example, such a situation may obtain when the 
estimating equation is fitted by inspection. In such cases it is customary to use the 
formula 



The reason for using this expression rather than the one involving is that the cri- 
terion of fit is the least squares criterion, and so the closeness of the correlation as well 
as the goodness of the fit depends on reducing the squares of the residuals from the 
fitted hue. 

® If the two variables X and Y are thought of as being composed of elements equally 
likely to be present in any item (some of which are common to X and Y, but some of 
which occur in the one and not the other), then the coefficient of determination of the 
entire population is the product of the two proportions of common elements, and the 
coefficient of correlation is their geometric mean. Let us take 5 disks (elements) marked 
on one side as follows (the other side being blank): 



If we should throw aU 5 disks hi the air, when they fall any number of X’s from 0 to 4 
might appear, and also from 0 to 3 F*s. Whenever an X appears, the chances that a 
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always be larger than unless == 1, when r = 
= 

and 

r = 

Also 


and 

r = 

Less laborious methods of computing r, which are not quite so straight 
forward in their meaning, are explained on pages 671-672. 

The sign of r is always the same as the sign of 6 in the equation of re- 
lationship. Unless the value of the coefficient is very low, the sign of r 
can be determined by inspection of the scatter diagram. 

The procedure just outlined has involved the obtaining of an equation 
with which to estimate values of Y from known values of X, and the ex- 
plaining of variations in Y values by associating them with variations in X 
values. Mathematically it would be possible to obtain a line, the sum of 
the squares of the horizontal deviations from which is at a minimum, and 
from that line to explain variations in X values or to estimate X values 
from given Y values. The equation^^ Xc = a' + h'Y would not describe 
the same line as Fc = + bX, nor would be the same as (Xy^. How- 

ever, r would be the same regardless of wliich route was used to obtain it. 
This is merely to say that mathematically, so far as r is concerned, either 
variable may be labeled X and the other Y. The X variable (plotted 
along the horizontal axis) is customarily the causal factor, if causation can 
logically be inferred; the Y variable is the resultant factor. If causation 
cannot be inferred, the factor which is the basis of our estimates is con- 
sidered the X, or independent, variable. If, however, we wish to estimate 
values of the causal series from values of the resulting series, the causal 

Y will also appear on the same disk are 2 out of 4; likewise, whenever a Y appears, the 
chances are 2 out of 3 that an X wOl appear on the same disk. If we should ttirow 
these disks in the air a number of times, counting the X^s and F’s each time, there would 
be correlation between the number of X^s that appear from throw to throw and the 
number of Y^s The most likely value of is | X 1 = +.333, while the most likely 
value of r is Vf X 3 = +58 The larger the number of throws, the greater will be 
the tendency for r to approach this value. For a demonstration of this theory, see 
Croxton and Gowden, Practical Business BtatisticSf pp U6-419, Prentice- Hall, Inc., 
New York, 1934. 

The normal equations required would be: 

I. XX = Na^ + ¥XY, 

IL XXY = a'XY d b'XY\ 


'Vc _ 


5.99 
10 43 


- .574, 


+.758. 




= 1 


/T^ 4- 4-4 

— ^ = 1 -- = 1 - .426 = .574, 


10 43 


+.758. 
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factor may still be plotted as the X variable, but the estimating equation 
is of the type 

Xc - a' + VY. 


The product-moment formula. The coefficient of correlation may be 
approached from a number of different points of view. The approach 
which has been followed is especially enlightening, since essentially the 
same technique can be applied to curvilinear and multiple correlation. 
But the following explanation is also simple and, for certain purposes, 
extremely useful. 

In the estimating equation b tells us the normal amount by which the 
dependent variable changes with a change of one unit in the independent 


V 

variable. It is the slope or - ratio of any point on the estimating equa- 

CO 

tion, when y and x are defined as deviations from the mean of the series, 
so that the estimating equation becomes == bx, and b is obtained by 

'V/lf 

finding^ ^ the value of Although this constant h is essential for purv 


poses of estimation, still it cannot tell us the degree of relationship between 
the variables, since they are not directly comparable with each other. 
The X series and the Y series do not have the same dispersion, and may 
even be in different physical units. However, comparability between the 


qj 

terms of the ratio - can be obtained by dividing the numerator by^^ cry 


and the denominator by (Xx or by dividing the entire expression by ^ • 

CTx 

Thus, b is transformed into r as follows: 

Xxy (Ty _ Xxy cTx __ Xxy <Jx _ Xxy 
Ihx^ * cTx ^x^ (jy Ncrl (Ty ~ NcTxCfy 


In this form the ratio is known as the product-moment form of the coeffi- 
cient of correlation.^^ Thus it may be seen that r is merely the slope of 


See Appendix B, section XXII-2. 

Although we are referring to the standard deviation of the Y values, we use the 
symbol Cy instead of cry for convenience. It should be clear that Cy = cr^, because 

^ = 0, r - F = y - 0, and S(F - F)* = 

13 Another way of getting the same result is to think oi r as a special case of 6; namely, 
when the original data have been made comparable by expressing them in units of their 
own standard deviations. Thus 








2 


Yixy _ "Zxy <r| _ 

(TxCy ~ (Tx<Ti Ncl 


becomes 
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the estimating equation when both numerator and denominator are in 
standard deviation units. 

Now since 


and 


Analogously 



Use of the estimating equation in this form, 
in this chapter. 


X, will be made later 

(Tz 


Practical Methods of Computation 

The previous illustration involved a limited number of paired items in 
order to illustrate the theory of correlation as concisely as possible. In 
most practical problems, however, we have a large number of pairs of 
items. In practice, therefore, it is advisable to modify the foregoing 
methods slightly in order to save time, and we shall illustrate the shorter 
procedures by means of a sample involving 64 pairs of items. The reader 
can readily see that the procedure described previously would be very 
laborious when applied to such a problem. 

As the initial step in a correlation problem, a scatter diagram should 
always be drawn. If only an approximate idea of the degree of relation-' 
ship is required, inspection of the scatter plot yields satisfactory results. 
After a little experience in correlating, the statistician is able to make sur- 
prisingly close estimates of r by inspection. The scatter diagram may 
frequently be used for exploratory purposes and may occasionally yield 
sufficient information to eliminate the need of determining the coefficient 
of correlation. 


The formula is often stated also as r 



The reason for the adjective 


“product-moment” becomes clear when it is realized that ihe word “moment” refers 
to the average of some power of the deviations from a mean. Thus r is the first mo* 
ment of the product of the variables when each has been previously stated in ternos ol 


its own standard deviation. 


XXII-3. 


For proof that 





see Appendix B, section 
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The data used in the following illustration of procedure are dividends 
per share and lowest price per share, during 1935, of common stocks of 64 
American industrial corporations. These companies were selected at ran- 
dom from Moody^^ Industrials j 1936, Chart 221 is the scatter diagram, 
while the data and computations required for values used in the formulae 
are shown in Table 159, 

LOW PRICE, 

DOLLARS PER SHARE 



Chart 221. Dividends per Share and Low Price per Share of 64 Medium Grade Com 
mon Stocks of American Industrial Corporations, 1935. (Data of Table 159.) 

Substituting in the two normal equations 

Sr - ATa + hSZ, 

SXF = aSX + 6SZ2, 

we have: 

1,600.61 - 64a + 102.94&, 

3,198.1790 = 102.94a + 216.95586, 

and the equations solved simultaneously yield the estimating equation 
Yc - 5.48606 + 12.1382X. 

Values for a and 6 have been carried to six digits, since subsequent calcu 
lations will be bssed on them. 




TABLE 159 


Computation op Values Used in Determining Measures of Relationship Be- 
tween Dividends per Share and Low Price in 1935 op Medium Grade 
Stocks of 64 American Corporations 


Company* 

Dividends 

per 

share in 
1935 

X 

Low 

price 

during 

1935 

Y 

XY 


Y2 

Alaska Juneau Mining Co 

$1.20 

$13 25 

15 9000 

14400 

175 5625 

American Agric Chem Co 
(Det ) . 

2 50 

41.50 

103 7500 

6 2500 

■ 1,722 2500 

American Machinery & 
Foundry Co 

1 00 

18 50 

18 5000 

lOOOO 

342 2500 

Anchor Cap Corp 

.60 

10 88 

6 5280 

3600 

118 3744 

Armstrong Cork Co 

88 

16 50 

14 5200 

7744 

272 2500 

Associated Oil Company 

100 i 

29 75 

29.7500 

1.0000 

885 0625 

Bloommgdale Brothers, Inc 

40 

16 62 

6.6480 

1600 

276 2244 

Borg-Warner Corp 

175 

28 25 

49 4375 

3 0625 

798 0625 

Brown Shoe Co 

3 00 

53 00 

159 0000 

9 0000 

2,809 0000 

Burroughs Adding Machine 
Co 

.105 

13 25 

13 9125 

1 1025 

175 5625 

California Packing Corp . . . 

1 50 

30 50 

45 7500 

2 2500 

930 2500 

Cannon Mills Co 

2 00 

30 00 

60 0000 

4 0000 

900 OOOO 

Chicago Mail Order Co 

2.00 

15.12 

30 2400 

4 0000 

228 6144 

Cleveland Graphite Bronze 
Co . . 

1 25 

27 62 

34 5250 

15625 

762 8644 

Colgate-Palmolive Peat Co 

.75 

15 12 

11 3400 

.5625 

228 6144 

Commercial Solvents Corp. 

85 

16.50* 

14 0250 

.7225 

272 2500 

Cudahy Packing Co . 

2 50 

37 00 

92 5000 

6 2500 

S 1,369 0000 

Diamond Match Co. 

195 

26 50 

51 6750 

3 8025 

! 702 2500 

Duplan Silk Corp 

Electric Storage Battery Co 

100 

12 75 

12 7500 

1 0000 

162 5625 

3.25 

39 00 

1 126,7500 

1 10 5625 

1,521.0000 

Eureka Vacuum Cleaning 
Co , Inc ... 

80 

10 50 

8 4000 

6400 

110 2500 

Freeport Texas Co 

1 00 

17.25 

17.2500 

1 OOOO 

297.5625 

General American Trans- 
portation Co 

175 

32 62 

57.0850 

3 0625 

1,064 0644 

General Mills, Inc 

3.00 

59 88 

179.6400 

9 0000 

3,585.6144 

Glidden Company . . 

1.60 

23.38 

37.4085 

2.5600 

546.6244 

Granite City Steel Co. . . 

1.00 

i 1812 

18.1200 

1.0000 

328.3344 

W. T. Grant Co. 

1.25 

26.00 

32.5000 

15625 

676 0000 

Ear bison- Walker Refracto- 
ries Co. . 

1,00 

16.00 

16.0000 

1.0000 

256.0000 

Hercules Powder Co 

3.50 

71.00 

248.5000 

12 2500 

. 5,041.0000 

Industrial Rayon Corpora- 
tion 

1.26 

23.50 

29 6100 

15876 

1 552.2500 

International Printing Ink 
Corp 

1 10 

21.50 

23.6500 

1.2100 

462.2500 

Kaufman Department 
Stores, Inc 

100 

7.50 

7.5000 

' 1.0000 

56 2500 

S. S. Kresge Co. 

100 

19.75 

19 7500 

lOOOO 

390 0625 

Lambert Co 

2 75 

21.38 

58 7950 

7.5625 

457.1044 

Libby-Owens-Ford Glass Co 

120 

1 21.50 

1 25 8000 

1.4400 

462 2500 
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TABLE 159 (Continued) 

Computation op Values Used in Determining Measures op Relationship Be 
TWEEN Dividends per Share and Low Price in 1935 op Medium Grade 
Stocks op 64 American Corporations 


Company* 

Dividends 

per 

share in 
1935 

X 

Low 

price 

during 

1935 

Y 

1 


Y2 

Manhattan Shirt Co 

S 60 

$10 00 

6 0000 

.3600 

100 0000 

Marlm-Rockwell Corp 

150 

20 00 

30 0000 

2 2500 

400 0000 

McCall Corporation . . 

2 00 

28 00 

56 0000 

4 0000 

784 0000 

Melville Shoe Corporation . 

2 88 

4100 

118 0800 

8 2944 

1,681 0000 

MacAndrews and Forbes Co 

3 00 

37 88 

113 6400 

9 0000 

1,434 8944 

John Morrell and Co , Inc 

3.30 

41.88 

138 2040 ! 

10 8900 

1,753 9344 

Natomas Co 

.95 

7 50 

71250 

9025 

56 2500 

Newberry (J J ) Co 

1.45 

43 50 

63 0750 

2 1025 

1,892 2500 

Peoples Drug Stores, Inc , 

2.75 

30 00 

82 5000 

7 5625 

900 0000 

Phelps Dodge Corp . . . 

50 

12 75 

6 3750 

2500 

162 5625 

Pillsbury Flour Mills Co. . 

1.60 

31.00 

49 6000 

2 5600 

961 0000 

Phillips Petroleum Co. . 

1.25 

13 75 

17 1875 

1 5625 

189 0625 

Pullman Incorporated . . . 

2.62 

29 50 

77 2900 

6 8644 

870 2506 

Ray best os Manhattan, Inc. 

100 

16.50 

16 5000 

10000 

272 2500 

Reynolds Metals Co. . . . 

100 

17.50 

17 5000 

1 0000 

306 2600 

Safeway Stores, Inc. 

2 75 

3162 

86 9550 

7 5625 1 

999 8244 

Socony- Vacuum Oil Co. . . 

30 

10 62 

3 1860 

0900 

112 7844 

South Porto Rico Sugar Co. 

2 00 

20 00 

40 0000 

4 0000 

400 0000 

Spencer Kellogg & Sons Co. 
Standard Oil Co. of Cali- 

160 

31 00 

49.6000 

2 5600 

961 0000 

fornia 

100 

27 75 

27.7500 

1 0000 

770 0625 

Telautograph Corp , . 

75 ! 

6.25 

4.6875 

5625 

39 0625 

Thatcher Manufacturing Co 

.75 1 

13 12 

9 8400 

5625 

172 1344 

Timken Roller Bearing Co . 
United Biscuit Co. of Amer- 

3.00 

1 

28 38 

85 1400 

9 0000 

805 4244 

ica , . . ^ 

1.60 

20 25 

32 4000 

2 5600 

410 0625 

Vulcan Detinning Co . . . ■ 

Warren Foundry Pipe 

4 00 

63 50 

254 0000 

16 0000 

4,032 2500 

Corp ... 

Wesson Oil & Snowdrift Co., 

1.75 

20 62 

36 0850 

3 0625 

425 1844 

Inc 

2 50 

30 50 

76 2500 : 

6 2500 

930 2500 

Westinghouse Air Brake Co. 
Westvaco Chlorine Prod- 

50 

18 00 

9 0000 

.2500 

324 0000 

ucts Corp 

40 

16 75 

6 7000 

.1600 

280 5625 

Total 

102 94 

1,600 61 

i 

3,198.1790 

216 9558 

51,363 9273 


Source Moody'*!! Indu<3triah^ lOSC 

^ Cc^rporations "vvere selected at random *rf*m I'ct in the Annalist The sample include?* 

©ills' corporations -whose shares -were listed o i iri<- New Y(”’v ''toe ^ Exchange and traded in during 1935 
It iiicUdes onlj stoclca -with Titch ratings o: ii oiid BP, and -w/.ch paid diviaends in 1934 and 1935 
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Instead of computing <Ty^ from the formula 

„ S 2 /I S(F - Fc)2 

N ' 

a less direct method is much easier. First we compute S F§ by the ex 
pression 

SFg = oSF + feSZF. 

Then we obtain^^ 

Sj/I = SF2 - SF§ 

= SF2 - (aSF + bSZF). 

The indirect but very easy formula for obtaining then, is 
, SF2 - (aSF + 6SXF) 

Substituting in this formula, we find: 

<r2 = 51,363.9273 - [(5.48606) (1,600.61) + (12.1382)(3, 198.1 790)] 

64 

51,363.93 - 47,601.18 „„ 

— — oo./y, 

and <Ty^ = $7,667. 

The coefiicient of correlation may be obtained as follows:^® 

™2 = £ls ^ ^ ^ 

erf 2^2 -f. iV 

(aSF + 6SZF) - FSF 
2F2 - FSF 

This is a very easy expression to use, since most of the values have already 
been computed. Thus 

2 = 47,601.18 - (25.0095) (1,600.61) ^ 7,570.543 ^ „ 

^ 51,363.93 - (25.0095) (1,600.61) 11,333.422 ^ ^ ’ 


r = +.8173. 

It is also easy to obtain r by the formula: 


- 1 - 


Proof of the formulae for and is given in Appendix B, section 
equations 3, 6, and 7. __ 

For proof that Uy® ~ {aXY -f ^>2X7) — FSF, see Appendix B, section XXII-1, 
equations 4 and 5 For proof that e= SF® — . FSF, see Appendix B. section 
XIII-2, part B~1 
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The value of has already been computed, and and SF are avail- 
able, from which to compute crl. The sign of r is not given by these ex- 
pressions, but T takes the sign of h in the estimating equation, or, as noted 
earlier, the sign may usually be discovered from an examination of the 
scatter diagram. 

If it is desired to compute r first, and to regard (Xy^ and the estimating 
equation as a by-product, it may be done quite readily. It will be recalled 
that one expression for the coefficient of correlation mentioned on page 
666 is 


r = ■ 

NaxCy 


In this formula the variables are taken as deviations from their respective 
means If they are taken in their original form (x = X — X and y = 
F - F), theni® 


Thus 


NXXY - (XX) (XY) 
V[NXX-^ - (SX)^] [A^SF^ - (SF)2]' 


(64) (3,198.1790) - (102 94) (1600.61) 

^ ^[(64) (216.9558) - (102.94)2] [(64) (51,363.927) - (1600.61)2] 

= +.8173. 


Having obtained r, we may compute our estimating equation by means 
of the two normal equations used before, or by use of the equation devel- 
oped on page 667 : 


2/c = ^ 




X. 


Since the data are not in deviation form, the equation must be written 


(Yc-Y) (X - I). 

(Xx 

The equation for estimating price from dividends therefore becomes 

Fc - 25.0095 = .8175 (Z - 1.6084), 

and 

Fc = 5.4861 + 12.138Z, 

which is the same as pre^dously obtained. 

The calculation of the means and standard deviations, which this pro- 


For derivation of this formula, see Appendix B, section XXII~4 The value of 
may be obtained by squaring r. However, a somewhat more accurate figure may be 
obtained by first squaring the numerator and denominator of the expression for r 
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cedure requires, is not shown, since it is a matter with which, the student 
is already acquainted. 


Correlation of Grouped Data 

Commodities differ greatly in the frequency with which they change their 
price. Some, such as agricultural implements, are very rigid and change 
only after mature deliberation on the part of sellers, while others, like 
commodities on an organized exchange, are very flexible, changmg fre- 
quently from minute to minute. During the depression which began in 
1929, it was noticed that some of those articles, the prices of which seldom 
changed, failed to drop so far as those with greater price flexibility. 

Was this a general tendency? Is them a close correlation between price 
flexibility and price change? With over 700 price series available, col- 
lected by the United States Bureau of Labor Statistics, this question can 
be answered rather definitely. But since the pairs of items to be corre- 
lated are large in number, it is easier to group them before undertaking 
calculations. First the data are tallied as in Table 160, which resembles 
a scatter diagram except that each point, instead of being plotted exactly, 
is merely entered in the appropriate cell. Thus a commodity which 
changed in price during 93 of the 94 months under consideration (1926- 
1933), and also declined during the period 1929-1932 to 18 per cent of its 
1929 price, would be tallied in the extreme lower right-hand corner. 

Table 161 is a correlation table. The figures in the center of each cell 
are taken from Table 160, The fy values are obtained by adding the 
numbers horizontally; the fx values, by adding vertically. These two sets 
of figures will be recognized as frequency distributions of the dependent 
and independent variables respectively. The total frequencies, or com- 
modities iV, for each distribution are, of course, the same: 736. The three 
other columns and rows in the table are identical with those to which 
we are accustomed for computing the mean and standard deviation from 
a frequency distribution, except that here we have two frequency distri- 
butions, one of the X values (running horizontally) and another of the Y 
values (running vertically). For ease m computation, deviations are 
measured in terms of class intervals from assumed means, that of X being 
chosen as 44.5 price changes and that of Y as 65 per cent of 1929. 

Since XY values are required for r, these also are computed for each 
cell and totaled. This is done by multiplying the X deviation by the Y 


The data are taken from a scatter diagram which appeared on p. 3 of Gardiner C. 
Means, “Industrial Prices and Their Relative Inflexibility,” Senate Document No, 13^ 
74th Congress 1st Sessionj 19$$. The data on flexibility run from January 1926 through 
December 1933, except that no observation was taken for the period December 1929- 
January 1930. 
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deviation (shown in the upper part of each ceil), and finally multiplying 
this product by the appropriate frequency. The results are shown in 
boldface type in the lower part of each ceU. It will be noticed that the 
first and third quadrants are positive, while those in the second and fourth 
are, of course, negative. The algebraic total of these products is shown 
in the lower right-hand corner of the table. There is no subscript for / in 

TABLE 160 

Tabulation of Price Flexibility and Magnitude op Price Change of 

736 Commodities 


MAGNITUDE OF PRICE CHANGE (y) 
PER CENT OF 1929 



PRICE FLEXIBILITY (X) 

NUMBER OF PRICE CHANGES, 1926-1933 

-y Gardiner C. Mea^, ‘‘Industrial IMces and Their RelsCtive Inflexibility,” Senate Document No. iS, 

74 th Congress, 1st Session t i9SS 
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the expression Ttfd'xd'y since each cell frequency is common to an X class 
and to a F class. 

It would be possible now to set up two normal equations and obtain the 
estimating equation directly, as was done for the ungrouped data in other 
parts of this chapter.^® Such a procedure would be necessary for certain 

The normal equations are : 

I 'Efyd'y - iNTa -h h:Lfxd'x; 

IL l^fd'xd'y = a2fxd^x + h2fx{d'x)^. 

Making the necessary substitutions, we have* 

I- 462 = 736a - 4226; 

II. -4317 = 422a -f 87966 


Solved simultaneously, these yield the estimating equation 

d'vc = 3561 - .4737 d'x. 

The following table explains the computation of Yo values from this equation* 


A 

d'x 

{a + hd'x) 

dy^ 

(jy • d'y^) 

Yc 

{Yi-i-dyJ 

45 

-4 

2 2509 

22 509 

87.51 

24 5 

-2 

13035 

13.035 

78.04 

44 5 

0 

.3561 

3 561 

68.56 

646 

2 

- .5913 

- 5.913 

59 09 

84 5 

4 

-1.5387 j 

-15.387 

49 62 


Computation of Yc values other than mid-values of classes, however, is facilitated by 
stating the equation in terms of the original data. When so stated, the equation is 
Yc = 89.64 — 4737X. The procedure for making this transformation is explained in 
Appendix B, section XXII~5 

We can now find and r by use of formulae paralleling those used in ungrouped 
data. 




Hffyjdy)^ — (aZfyd'y -{- hUifd'xjy) 

N 


3822 - [.3561(462) - .4737(4317)] 
736 


3822 - 2209 5070 
736 


== 2 1909. 


ss V 2.1909 = *1 4802 intervals = 14.802 per cent. 

^ _ Njalifyd'y -f- hXfd'xf^Y) — (^fyd'y)^ 
N'Efyid'y)^ - (Xfydy)^ 

^ 736(2209.5070) - (462)^ _ 

736 (3822) - (462)^ 

r = V 5435 = -.7372. 
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TABLE 161 


CORKELATION TaBLB OF PRICE pLEXIBIEITY AND MaGNITTJDEI OF PRICE CHANGE OF 

736 Commodities 

PRICE FLEXIBILITY (X) 

Number of price cbangcs, 1920-1933. 


Class 

limits 


0-9 

10-19 

20-29 

30-39 

40-49 

50-69 

50-69 

70-79 

S0-S9 

90-99 

/r 

d’y 

fydW 

fyid^y)* 


Mid- 

value 

45 

14 5 

24 5 

34 5 

445 

545 

645 

74 5 

84 5 

94 5 

160 0- 
169 9 

165 

-40 

1 

-40 










1 

10 

10 


150 0- 
159 9 

155 

-36 

1 

-36 










1 

9 

9 

D 

140 0- 
149 9 

145 












8 

0 

0 

130 0- 
139 9 

135 












7 

0 

0 

120 0- 
129 9 

125 












6 

0 

0 

IK)0- 
119 9 

115 

-20 

6 

-420 

-IS 

3 

-45 


-5 

2 

-10 

0 

2 

0 






13 


65 

325 

100 0- 
109 9 

105 

-16 

43 

-688 

-12 

7 

-84 


-4 

2 

-8 

0 

1 

0 




+16 

1 

16 

+20 

1 

20 

55 

4 

220 

880 

90 0- 
099 

95 

-12 

74 

-6SS 

-9 

16 

-144 

-6 

2 

-12 

-3 

10 

-30 


+3 

1 

3 





103 

3 


927 

800- 

89.0 

85 

-8 

36 

-288 

-6 

35 

-210 

-4 

10 

-40 

-2 

13 

-26 

0 

2 

0 

+2 

2 

4 

+4 

1 

4 




99 

2 

198 

396 

70 0- 
79.9 

75 

-4 

23 

-92 

-3 

30 

-90 

-2 

17 

-34 

-I 

12 

-12 

0 

8 

0 

2 

+2 

3 

6 

4-3 

1 

3 

+4 

2 

8 

+5 

4 

20 

102 

1 

102 
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non-linear estimating equations. However, it is somewhat simpler to 
compute first the correlation coefficient, and then the estimating equation 
and standard error of estimate. 

To obtain r directly from ungrouped data the following formula has 
been recommended; 

NXXY - (2X)(S7) 

V[NSZ2 - (SZ)2][iVSF2 - (SF)2]‘ 

For grouped data, X is replaced by d^x and F by dV? the symbol / is intro- 
duced, and the expression becomes 

^ _ N'Efd^xd'y — (Sfxd'x)(^fYd'Y) 

VlN'SMd'x^ - (^fxd'xr] [Nl^frid'ry - (S^dV)"]’ 
Substituting in this formula, we have 

(736) (-4317) - (-422) (462) 

^ V[(736)(8796) - (-422)^] [(736)(3822) - (462)2] 

= -.7372. 

The following naeasures are readily computed by familiar methods: 

I = 38.766. F = 71.277. 

0-* = 34.090. o-j, = 21.906. 

Now since 



= (7?(1 — r^), 

= (TyV 1 — r^. 

Substituting: 

cTy^ = 21.906 Vl - (-.7372)2 
= 14.802. 

To obtain the estimating equation, we have the equation — r~ x. 
But since y Y -- Y, and x ^ X — X, we know that 

Yc- 7 = (X - I). 

Substituting in this equation, we have 

Yc - 71.277 = "-73721—^ (Z - 38.766), 


or 


Yc = 89.64 - .4737Z 
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These values are not exactly the same as would have been obtained had 
the computations been based upon ungrouped data. The difference, how- 
ever, is ordinarily very slight and is due to the fact that the items are not 
distributed evenly within each cell, and hence the mid-values do not corre- 
spond with the actual means of each cell. The errors tend to offset each 
other, provided the X and Y distributions are approximately symmetrical. 
But this is merely a tendency and there is almost always a small discrep- 
ancy in the results. In general it may be said that, in order to keep the 
inaccuracy within negligible limits, it is well to have at least twelve groups 
in each direction. 

Causation and the Correlation Coefficient 

The coefficient of correlation must be thought of, not as something that 
proves causation, but only as a measure of co-variation. Any one of the 
following situations may, in fact, obtain: 

1. A variation in one variable may be caused {directly or indirectly) by 
a variation in the other. The variable that is supposed to be the cause of 
variations in the other is usually taken as the independent variable and 
plotted along the X-axis. Thus, because dividends on stocks are thought 
to affect stock prices, rather than vice versa, the ^^dividends^^ series was 
made the independent variable, in an earlier illustration. It is a logical 
process which determines the statistician's belief that there is causal re- 
lationship between the two variables, and his belief as to which is cause 
and which is effect. It must be evident, then, that the coefficient of corre- 
lation in itself does not say that X causes F, any more than it says that 
Y causes X. 

2. Co-variation of the two variables may be due to a common cause or 
causes affecting each variable in the same way, or in opposite ways. If it 
should be found that there is correlation between automobile accidents 
per 1,000 persons and per capita federal income tax payments, it should 
not hastily be concluded that it takes an automobile accident to jar a 
person into paying his income tax; nor is it necessarily true that making 
large tax payments incapacitates a person for driving carefully. It is 
quite possible, however, that in states where the average income is high, 
the income taxes will be large, a large proportion of the people will own 
automobiles, and accidents MU be numerous. 

3. The causal relationship between the two variables may be interacting. 
Thus, a high price for a commodity stimulates its production, but increased 
production may increase or decrease the cost of a commodity, depending 
upon the period of time under observation and whether it is an increasing 
or decreasing cost industry. 
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4. The correlation may he due to chance. Even though there may be 
no relationship whatever between the variables in the universe from which 
the sample is drawn, it may be that enough of the paired variables that 
are selected may vary together, just by chance, to give a fair degree of 
correlation. Thus it might be found that in a given group of male stu- 
dents there was positive correlation between the size of their shoes and 
the number of cigarettes in their pockets. Yet it is hard to develop a 
theory as to why this should be so, and the chances are that another 
sample would 3 deld quite different results. In a later section brief atten- 
tion will be given to measurement of the reliability of r. 

Estimate of Correlation in Population 

All of the computations so far have measured the correlation in the 
particular sample. This tends to be higher than the correlation in the 
population from which the sample was drawn, and especially so when the 
sample is small. Estimating the correlation in the population may there- 
fore be' thought of as making allowance for the size of the sample. The 
best estimate of correlation in the population is from the formula^^ 

-2 - 1 ^ - r " {N 1) 

^ ^ d-g ““ N -m 

in which m is the number of constants in the estimating equation (always 
2 in case of simple linear correlation). The formula therefore simplifies 
to 

^.2 = 

N -2 

Applying this correction to the forest tree illustration, we have 

18 

f = VmO = +.742. 

This compares with the sample r of +.758. A similar correction when 
applied to the common stock problem, a sample of 64 items, lowers the 
correlation only from +.817 to +.814. 

If the value of r^ is very low, may be negative (and f imaginary) 
In such a case the correlation in the population should be considered to 
be zero. 


This formula is derived, in Appendix B, section XXII--6. 
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Reliability of the Correlation Coefficient 

General measure of reliability. The standard error of the correlation 
coefficient (crr)j Vv^hich is analogous to the standard error of the mean, is 
computed from the expression 


CTr = 


VN - 1 

where Tp is the correlation in the population. 

This measure of the sampling error of r is subject to certain limitations. 
In the first place, the distribution of sample coefficients around the popu- 
lation r is approximately normal only in case the latter is zero. When the 
population r is positive, the sampling distribution is negatively skewed. 
Thus, if the true r is +.80, different sample coefficients can be only .20 
higher, but some might conceivably be as low as —1.00 (a drop of —1.80). 
As Tp approaches zero, the distribution gradually approaches normality. 
No precise line of demarcation can be drawn beyond which it is unwise to 
use (Tr, but it has been suggested by Tippett that consideration of skewness 
becomes especially important as Vp approaches .80. A second lifnitation 
is that, even when Tp is zero or nearly so, the distribution of r is not normal 
for small samples, but approaches normality as the sample size is increased. 

We are sometimes interested in testing whether or not there is any sig- 
nificant correlation; that is, whether the hypothesis is tenable that there 
is no correlation present in the population. If the hypothesis is discredited, 
the correlation is considered significant. To test this hypothesis, we must 
substitute zero for Vp in the formula for (Tr, which now becomes 


(Tt 


Vn - i’ 


This expression, as indicated in the preceding paragraph, should not be 
used when N is small. 

It will be recalled that a random sample of 64 corporations showed a 
correlation coefficient between dividends and price of +.8173. Using the 
formula above, 

(Tr = += = .1260. 

V63 


As is usual when test.ing for significance, we must divide the difference to 
be tested by the appropriate standard error. Thus we have 


r ^ .8173 
0-, .1260 


= 6.49. 


Since statisticians usually have considerable confidence that a relationship 
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is not due to chance if r is at least three times clearly our correlation is 
significant. If the test indicates lack of significance, the remedy is to 
increase the size of the sample. 

The expression 


CTt 




— m 


where m is the number of constants in the estimating equation, is some' 
times used as a test of the reliability of r. For the 64 corporations 


CTr 


1 - .6680 


.042. 


The correlation coeflScient is then written r = +.817 ^ .042, which is in- 
terpreted to mean that, if the correlation in the population, were +.817, 
68.27 per cent of the coefficients computed from random samples of 64 
pairs of items would be expected to vary between +.772 and +.859. This 
is a very crude and unsatisfactory procedure, however, since +.817 is not 
the population figure and, even if the value of rp were +.817, the distri- 
bution of sample r’s around such an would not be normal. The distri' 
bution of sample r^s would be approximately normal only if N were large 
and rp were small. Using —ScTr, it is sometimes asserted that it is un- 
likely that the value of rp is below +.817 — 3(.042) = +.691. This, 
however, is an extremely rough application of fiducial probability. More 
satisfactory fiducial limits of r may be obtained by transforming r into Z 
(discussed later in this chapter), ascertaining the desired fiducial limits 
of Z, and converting to r. 

The t test. In order to discover whether or not an observed correla- 
tion coefficient is significantly greater than zero, we may use a procedure 
which is applicable to both large and small samples. This method con- 
sists in computing the value t from the expression 

t = r- ^ ^ ^ 

V¥ — m Vl - Vl - 


where m is the number of constants in the estimating equation. Fol- 
lowing this we consult the t table of Appendix F (described in Chapter 
XII on unreliability), referring to the values of t and oi n (ji ^ N — 2), 
and discover how many times in 100 a sample drawn from a population 
with zero correlation would result in a correlation coefficient as high as 
that actually obtained. If this chance is very low, the correlation is 
assumed to be significant. For the illustration just discussed, 

.8173V^ 


t = 


Vl - .6680 


= 11 . 2 . 
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Since t gives a ratio of 11.2 and n == 62, it appears that a correlation co- 
efficient of this size could hardly have been due to chance if based on a 
sample drawn from a population having zero correlation, and furthermore 
that, if we should correlate yield and price of all common stocks such as 
these, we should find positive correlation to obtain. 

T 1 

The value of — , when (Xr = —7=== was found to be 6.49 as compared 

with a value for t of 11.2. Now, the t distribution is almost normal when 
n exceeds 30, Since n is 62 in this example, it is apparent that the test 
involving <Xr errs on the side of stringency when testing for presence of 
correlation, since this test requires a larger value of r to give a ratio which 
would indicate the same probability as that found by the t test. The t 
test is the more appropriate procedure to use when testing for the presence 
of correlation, since the distribution of sample r's approaches the normal 

iV — Tyi 

curve only as N becomes large, while the ratio — follows the t dis- 

VI — 

tribution whatever the size of the sample. The required value of ^ for 
various levels of significance and various degrees of freedom has been 
tabulated in convenient form, as in Appendix F. This table includes all 
values of n from 1 to 30; for larger values of n, we may use the last row 
of the t table, which is taken from the table of normal curve areas. 

Analysis of variance. An alternative to the t test, which has some 
special advantages in connection with certain types of correlation to be 
treated in subsequent chapters, is the analysis of variance. The elements 
of this method were developed in Chapter XIII. The procedure is: (1) 
compute explained variance and unexplained variance based upon degrees 
of freedom; (2) compute the value of 


I 


;t2 


= 1.15129 logio or F “ 


t*2 ^ 


(3) determine whether the explained variance is significantly greater than 
the unexplained by reference to the z table (appendix G 1) or the F table 
(appendix G 2). The variances are obtained by dividing the sunas of 
squared deviations by the appropriate degrees of freedom, as follows: 


Explained: 

il 

Unexplained:^ 

cl = 

Total: 

a? = 




N — m AT 2 
N -1 
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Applying this method to the common stock illustration, we obtain these 
results: 

Source of Variance Variance 

Explained . 7,570.5 1 = 7,570 5 

Unexplained . 3,762.8 62 ~ 60.690 

Total . . 11,333.3 63 = 179 894 


F 


7,570 5 
60 690 


« 124 74 


Referring to Appendix G 2, we find that, when ni = 1 and 712 = 60 (there 
is no entry for n 2 = 62), the .001 level of significance requires that F = 
11.972. Since we have a value for F several times that size, we can un- 
hesitatingly say that the explained variance is significantly greater than 
the unexplained, and therefore the correlation between the two variables 
is significant. 

The Z transformation. Although the last two methods take care of 
small samples, they are applicable only for testing whether the coefficient 
is significantly different from zero. Since the sampling distribution of 
- 2 


Vi - 


becomes more and more skewed as rp departs from zero, it is 


advisable to transform r into a measure, the sampling distribution of 
which is approximately normal, if we wish to test the divergence of a 
correlation coefficient from some hypothetical value other than zero or to 
test the difference between two correlation coefficients. This may be ac- 
complished by means of the Z transformation described by Fisher.^^ Thus 
T may be transformed into Z by the following formula. 


Z = I [log. (1 + r) - log. (1 - r)] 
= I log. ^ = 1.15129 logic 


In the present instance 

Z = 1.15129 log = 1.1486. 

^ .1827 

The standard error of Z is independent of r?; 

1 

^ ViV' — m — 1 


Thus 


V — 


Vei 


* .12804. 


2“ R. A Fisher, Staiistical Methods for Research Workers, pp. 202-210, Oliyer and 
Boyd, Edinburgh, 1938 (7th edition). 
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Although not quite so accurate as the i transformation for testing the 
hypothesis that tp is zero, the Z transformation is substantially accurate. 
For the problem in hand, 


^ ^ 1>1486 
(Tz A2804 


8.97. 


Since the Z distribution is almost normal, the significance of Z may be 
determined by reference to the normal curve areas. Since 8.97 is beyond 
the table limits, Z (and therefore r) is unquestionably significant. 

As stated above, the special province of Z is in testing the significance 
of the difference between the sample r and some known or hypothetical 
population value, or between two sample correlations. In order to do 
this, it is necessary also to transform the known or h 3 q>othetical rp, or 
the other sample r, into Z. 

Suppose, for instance, that we wish to test whether our r of +.8173 is 
significantly different from a hypothetical fp of +.7500 Whenrp =+.7500 


Zp = 1.15129 logio = .9730. 


Remembering now that, when r = +.8173, Z = 1.1486, and referring to 
this value of Z as Zi, we may compute 


Zi ^ Zp 1 1486 - .9730 .1756 _ . 

cTz^ .12804 12804 

Appendix E tells us that we may expect so large a difference from chance 
causes about 17 times in 100. Hence we must conclude that the differ- 
ence is not significant. If, however, we are testing the significance of the 
difference between our r of + 8173 and another sample r of +.7500 
(Z 2 = .9730) computed from 39 pairs of items, we must compute also 


Then 

and 


1 _ 1 
\/39 - 2-1 6 


.16667. 


= V(.12804)2 + (.16667)^ = .2102, 


Zi-Z2 .1756 

.2102 


Our table of normal areas tells us that a difference as large as the one ob- 
tained is to be expected (even if the two samples are drawn from the same 
population) about 40 times in 100. The difference between these two 
samples, therefore, is not significant. 

As mentioned earlier, Z may be used to ascertain fiducial limits of Tp. 
The procedure consists in determining the fiducial limits of Z for the de- 
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sired level of significance and the proper degrees of freedom, and convert- 
ing each of the two values of Z to r. 

Correlation of Ranked Data 

Sometimes statistical series are composed of items the exact magnitude 
of which cannot be ascertained but which are ranked according to size. 
Thus, in column 2 of Table 162, we have listed eight tennis players in 
order of their official ranking in 1936 by the United States Lawn Tennis 
Association. Because we wish to inquire whether tennis ability ran true 
to form in 1937, we have given their 1937 ranking order in column 3. 

TABLE 162 

Computation op Values for Correlation of Ranked Data: UisriTED States Lawn 
Tennis Association Rankings, 1936 and 1937 


Player 

(1) 

Ranking 

Difference m rank 
[D = Col 2 - Col 3] 

(6) 

1936 

(2) 

1937 

(3) 

+ 

(4) 

(5) 

J Donald Budge .... 

1 

1 




Frank A Parker . . 

2 

3 

. 

1 

i 

Bryan M Grant 

3 

4 


1 

1 

Robert L. Riggs 

4 

2 

2 

, . 

4 

John Van Ryn 

5 

8 


3 

9 

Joseph R Hunt ... 

6 

5 

1 


1 

Harold Surface, Jr 

7 

6 

1 


1 

C. Gene Mako 

8 

7 

1 


1 

Total 

1 



5 

5 

18 


Source: United States Lawn Tenms Association as reported by tne New York Times 


(As the table stands, we have the first eight of the twenty ranking players 
in the United States in 1936, who were also listed among the fest twenty 
in 1937.) 

Since the coefficient of correlation previously explained is not designed 
to deal with ranked data, we shall use Spearman^ s rank correlation coefficient^ 
usually designated by the symbol p, the formula for which is 

622)2 

P _ 1 )’ 

in which D refers to the difference in rank between paired items in the 
two series. (This coefficient must not be confused with coefficient of non- 
linear correlation, which customarily uses the same symbol.) In Tabi"‘ 
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162, it will be seen that the sum of the positive differences equals the sum 
of the negative differences, and thereby provides a check on the accuracy 
of the subtractions. Substituting the values in the formula, we have 


p = 1 - 


6(18) 
8(64 - 1) 


+.786. 


The formula gives the sign of the correlation, positive in this case. When- 
ever there is a tie in rank, the two or more positions should be split among 
the different items. Thus, had Riggs and Parker tied for second and third 
in 1937, each would have been ranked 2.5; while if Riggs, Parker, and 
Grant had tied for second, third, and fourth, each would have received a 
rank of 3. 

This formula may also be used for ordinary data by converting the 
numerical data into ranks. For instance, we may rank American League 
baseball players according to their batting averages. A coefficient for 
eight such baseball averages, selected in the same manner as were the 
tennis data, yields a rank correlation coefficient of —.143, which seems to 
indicate that tennis form is more consistent than baseball batting form. 
One reason for using the rank method rather than the more exact method, 
even when actual values are available, is to save time. This saving is 
greatest when there are not very many items to be ranked. Since a corre- 
lation coefficient is not very reliable when the number of items is small, it 
may sometimes be desirable to make an estimate of the degree of associa- 
tion by use of the rougher and more quickly computed p. 

The reason the rank method is not so accurate as the ordinary method 
is that all of the information concerning the data is not utilized. Thus 
the first differences of the values of the items in a series arranged in order 
of magnitude are almost never constant; usually these differences become 
smaller toward the middle of the array. If such first differences were 
constant, then r and p would give identical results. If the values, how- 
ever, are distributed normally, there may be applied to p a correction 
which will give the same results that would be obtained directly by com- 
puting These corrections always serve to increase the correlation; 
however, they are very small, in no case increasing the correlation by so 
much as .02. Furthermore, the correction is not always appropriate. In 
the present illustration we have only the upper tails of (possibly) normal 
distributions; if plotted, they would probably appear as reverse J dis- 
tributions. 


21 Tables of corrected values of p are given in some textbooks. See, for instance, 
R. E. Chaddock, Principles and Methods of Statistics, p. 300 and Appendix E; Houghton 
Mifflin Company, Boston, 1925. 
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Correlation of Qualitative Distributions 

Fortune Magazine printed a survey of public opinion in its October 1937 
issue, and among the topics included was the popular viewpoint on the 
issue of third terms for Presidents of the United States. The following 
information is derived from a table on page 150 of that issue. If the rich 


Attitude toward 
third term 

Rich 

Poor 

i 

Total 

Per cent 

Favorable 

508 

1,559 

2,067 

50 587 

Unfavorable 

905 

1,114 

2,019 

49 413 

Total . . . 

1,413 

2,673 

4,086 

100 OOO 

Per cent . , . 

34 581 

1 

65 419 

100 OOO 



and the poor were equally favorable to the principle of a third term, we 
should expect the following percentage distribution in the different cells: 


Attitude Rich Poor 

Favorable 34.581 X 60 587 = 17.49 65 419 X 50.587 = 33 09 

Unfavorable 34 581 X 49.413 = 17.09 65.419 X 49.413 = 32 33 


These four percentages total 100; that is, 17.49 + 17.09 + 33.09 + 32.33 
= 100.00. Applying these percentages to the total number of observa- 
tions (4,086), we should expect the following distribution of occurrences: 


Attitude Rich Poor Total 

Favorable 715 1,352 2,067 

Unfavorable 698 1,321 2,019 


Total 1,413 2,673 4,086 


We may now compare the observed with the expected frequencies, and 
compute as in Table 163 (ordinarily only this table would be con-* 
strueted; the others are purely expository). 

^2 = _ 185 . 447 . 

JO 

There is only one degree of freedom for these data: if any one f -- fc value 
is taken, the other f fc values are determined by the requirement that 
the total number of rich, poor, favorable, and unfavorable be the same 
as the observed values. A practical method of computing the degrees of 
freedom lost is to add the number of columns to the number of rows and 
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subtract 1. The x® table (appendix I) for the .001 level of significance re- 
quires that x^ = 10.827. It is, therefore, almost inconceivable that we 
could obtain a as high as 185.447 if attitude toward the question of the 
third term were not (in 1937) related to economic status. That hypothesis 
must therefore be discarded; the relationship between attitude and eco- 
nomic status must be held to be significant. 

A further question is: How close is the relationship? This may be 
answered by computing the coefficient of mean square contingency, 

c = = JUMMU = .21. 

>4,086 + 185.447 

The x^ test and the coefficient of mean square contingency can be used 
when there are two or more categories for each variable. 

One serious limitation to C is that its maximum value varies with the 
number of cells in the table, and therefore the values of C obtained from 


TABLE 163 

Computation op x^ Attitude of Pebsons on Thied Terms for Presidents, by 

Economic Classes 


Attitude and 
economic class 

1 

Observed 

/ 

) 

Expected 

fo 

1 

Difference 

f-fc 

(/-/cP 

fc 

Rich. 

Favorable ... . 

Unfavorable . . 

508 

905 

715 

698 

-207 

207 

42,849 

42,849 

59.929 

61 388 

Poor 

Favorable ... . 

Unfavorable 

1,559 

1,114 

1,352 

1,321 

207 ^ 

-207 

42,849 

42,849 

31.693 

1 32.437 

Total . 

^ 4,086 

4,086 



- 185.447 


Source* Derived Irom data on p 150 of “Fortune Quarterly Survey. X,” Fortune, Vol XVI. No 4 
October 1937 ‘ ’ 


tables with different groupings are not comparable. The Tnayim uTn value 
of C for a 2 X 2 table is .707; for a 5 X 5 table, .894; for a 10 X 10 table, 
.949; and the value approaches 1 as the number of classes is increased. 
The coefficient of mean square contingency (which has no sign) somewhat 
resembles the coefficient of correlation of grouped data, since the value of 
C approaches that of r as the number of classes is increased, provided 
certain conditions (regarding size of sample, normality of distribution, and 
arrangement of categories) are met. 

Another method, known as Sheppard’s method of unlike dgns, may be 
used for preliminary investigations. It is given by the expression cos U 
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1.8°, where U is the percentage of cases of unlike sign. 
instance, 


U = 


508 + 1,114 
4,086 


and the coefficient is 


39.7 per cent. 


In the present 


cos (39.7) 1.8° = cos 71 46° = .32. 

Appendix L gives a table of cosines. Whenever appropriate, a sign may 
be attached to C or to Sheppard’s coefficient. 
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CHAPTER XXIII 

NON-LINEAR CORRELATION 


The preceding chapter considered the simplest type of relationship be- 
tween two variables: a constant amount of increase in the dependent 
variable associated with a unit increase in the independent variable. Not 
always, however, is the linear hypothesis satisfactory. Although it may 
be practical to estimate the height growth of forest trees from the increase 
in their breast-height diameter by a straight line equation, still the method 
has limitations. The estimating equation was found to be 7c = 1-045 + 
1.677X. But it seems unlikely that a tree which has not grown in diam- 
eter during 10 years would, nevertheless, have grown a foot in height. 
Thus, while the equation may be used to make estimates, it is probably 
unsatisfactory as the formulation of a scientific law. Any equation for 
these data which is sound theoretically should define a line which passes 
through the point X = 0, 7 = 0. Even with such an equation, any esti- 
mate beyond the range of the data should be considered only as a hypothesis 
to be tested. 

The social sciences abound in non-linear relationships. In the field of 
economics, for instance, demand curves are seldom straight lines. Again, 
the law of diminishing returns as stated by one well-known economist 
reads: ^Tf additional equal quantities of a variable element are added to a 
fixed element, the additional output at first increases but eventually de- 
clines and finally becomes less than zero.^^^ As thus stated, a mathemati- 
cal formulation of the law would describe a curve with two bends in it— 
an equation with four constants— perhaps of the type 7c — + 6X + 

cX^ + dXK 

Transforming Data to Linear Form 

Chart 222 is a scatter diagram showing the relationship between pro- 
duction and real price of late cabbage in the United States during the 
years 1920-1936. ^^Real price’^ means price relative to the prices of othei 

1 A. G. Black in Economic Principles and ProhlemSf Walter E. Spahr, Editor, Farra'i 
and Rinehart, New York, 193d C3rd edition)* p. 113* 
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commodities in general : in this instance, real prices are the nominal prices 
divided by index numbers of wholesale prices of aU commodities, with the 
year 1926 taken as 100 per cent. The real price is thus an estimate of 
what cabbage would have sold for if the price of commodities in general 
had not changed. In this illustration each series has also been adjusted 
for trend, the trends selected being weighted moving averages which are 
thought to approximate the combined primary X secondary trends. Since 
the cyclical movements are not well-defined, the fluctuations in production 

REAL PRICE 



PRODUCTION 

Chart 222. Production and Price of Late Cabbages in the United States, 1920-1935, 
and Zones of Scatter. Estimating equation: Yq == 313.3790 — 2.142386X, shown by 
solid line. (Data of Table 164.) 

are thus largely irregular fluctuations, and the price fluctuations are those 
which, by hypothesis, result from the variation in production. Although 
the coefficient of correlation is — .862, the estimating line does not seem 
to describe the relationship so well as might be desired. No doubt a 
more complex equation could be selected which would fit the data more 
closely, but it is always well to give preference to equation types with a 
small number of constants when so doing does not give a markedly poorer 
fit. The simpler the equation, the more reliable are the results. The 
reason for this is easy to understand. 

Suppose a half do5ien points are set at random upon a scatter diagram. 
If an equation type with six constants is selected, a curve can be fitted to 
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pass through all six points. The correlation thereby appears to be perfect 
when, as a matter of fact, the relationship between the two imaginary 
variables is entirely random. The paradox is explained when, we realize 
that increasing the number of constants has an effect similar to reducing 
the size of the sample. A curve with six constants passing through six 
points is no more reliable than the mean of a sample of one item. In 
technical language, each time a constant is added, one degree of freedom 
is sacrificed, and thus, when a six-constant curve is fitted to six points, 
there are no degrees of freedom remaining. In this section, illustrations 
wiU be given of the transformation of data into such a form that a linear 
equation can reasonably be used. This practice has the desirable effect 
of reducing the constants to two, a and h in the equation type Yc - a + bX, 
The formulae that will be used will be similar to those used in simple 
correlation. It will be recalled that two normal equations were set up: 

I. 27 = Na + 62X. 

II. 2X7 = a2X + 62X2. 

From these the estimating equation Yc = cl + hX was obtained. Using 
the constants of this equation, an expression a27 -f- 62X7 = 27§ was 
computed, and this expression was used in obtaining both and r by the 
formulae 

272 - (a27 + IXXY) 

N ’ 

, (aSF 4- 62X7) - F27 

^ - SF2 - F2F 


Since 27g = a27 + 62X7, the above formulae for and may be 
rendered simpler in appearance by writing them* 




272-271 


N 


or simply 


N 


,2,^..n - .,,F2F simply M 

" 272 — 727' 2^2 


Writing the formulae thus may also help us to understand their mean- 
ing. Referring to the coefficient of determination we find that the 
numerator is the explained variation (w^hich becomes the explained vari- 
ance if divided by N), and that the denominator is the total variation 
(which becomes the total variance if divided by N)] hence may be 
thought of as the ratio of the explained to the total variation (as well as 
the ratio of the explained to the total variance). The numerator of the 
fraction is made up of two parts: the explained sum of squares 27§; and 
% correction factor 727, which is subtracted, leaving the numerator afe 
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the explained sum of squared deviations, or simply the explained variation 
(see Appendix B, section XXII-1, equation 4). Likewise the denominator 
consists of the total sum of squares and the same correction factor as 
before. The denominator is thus the sum of squared deviations, or simply 
total variation (see Appendix B, section XIII-2, part B-1). Since the 
correction factor is the same for both explained and total variation, when 
the explained variation is subtracted from the total variation, we obtain 

(272 ^ FS7) - (2Fg ~ F2F) - SF^ - SF8, 

which is the unexplained variation, the numerator of the formula for 
Use of logarithms. A scatter diagram of the cabbage data is shown as 
Chart 222. It is apparent that the relationship departs from linearity. 
We saw in Chapter XVI that a time series, the trend of which is concave 
upward sometimes becomes straight when plotted on semi-logarithmic 
paper, or when the logarithms are plotted on arithmetic paper. This is 
true also of data plotted in the form of a scatter diagram. Before actually 
transforming the data into logarithms, however, it is well to plot them 
first on paper which has been ruled logarithmically. If semi-logarithmic 
paper is used, we could alternately try plotting the dependent variable and 
the independent variable on the logarithmic scale. Paper is also available 
that is ruled logarithmically on both axes. Using this latter type of paper 
the cabbage data show a linear relationship in Chart 223 A. This means 
that a constant percentage increase in real price seems to be associated 
with a constant percentage decrease in production. 

Plotting on paper with logarithmic axes is equivalent to plotting the 
logarithms of the X values and of the F values on arithmetic paper. This 
has been done in Chart 223B. Computation of the various measures of 
relationship involves procedures analogous to those already used. Table 
164 gives the computation of values required. The formulae and their 
solution are below: 

Equation type: 

log F<? — log a + 6 log X, 
which is the linear form of 

Yc « aXK 

Normal equations: 

L 2 log F - V log a + 6S log X; 

II. 2 (log X • log F) = log a2 log J + &2 (log X)^, 

These equations yield, in terms of logarithms, a straight estimating fine 
from which the squares of the deviations of the logarithms are at a mini- 
mum. It is not, therefore, a least square fit to the original data, though 
the discrepancy is usually not large. 



1.8 


2.2 


1.9 2.0 2.1 

LOGARITHM OF PRODUCTION 
B 

Chart 223. Production and Price of Late Cabbages and Zones of Scatter: A. Plotted 
on Logarithmic Paper, and B. Logarithms of Data Plotted on Arithmetic Paper. Esti- 
matmg equation: Log Fc *» 6.239331 — 2.149117 log X, (Data of Table 164.) 
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Substituting values from Table 164 in these normal equations, we have: 

I. 31.066433 = 16 log a + 31.9958665; 

II. 61.923903 = 31.995866 log a + 64.0769625. 

These give the estimating equation 

log Yc = 6.239331 - 2.149117 log X; 
or in terms of the original data 

Yc = 1,735,128X-2-i49ii7, 

The explained sums of squares are: 

2 (log Fc)2 = log a2 log 7+52 (log X ■ log 7) 

= (6.239331)(31.066433) - (2.149117)(61.923903) 

= 60.752045. 

To obtain the standard error of estimate, we compute as follows: 

. ^ S(los ^ 2aog 7)^ - 2(log 7c)^ 

N N 

= 60-868566 - 6 _ Q, 75 , 2 045 ^ _^^72826, 

^logvg “ .085340. 

We may now proceed to find the zones of scatter, which ate shown in 
Charts 223 and 224. X = 100 per cent will be used as the point of ref- 
erence. 

If X = 100, 

log X = 2.000000, 

and substituting in the equation 

log Yc = 6.239331 - 2.149117 log X, 
log Yc = 1.941097, 

Yc — 87.32 per cent. 

The values of log Yc and criogj,^ must be added before the anti-logarithm 
is obtained; thus 

log 1 <7 d” O'logyg “ 1.941097 -j- .085340 == 2.026‘i4- 
log Yc - (Tiogys ^ 1-941097 - .085340 = 1.85576. 

Looking up the anti-logs of these values in Appendix P, we find: 

antilog (log Yc + (riogys) ^ 106.28 per cent; 
antilog (log Yc — criosva) = 71,74 per cent. 
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Similarly: 

antilog (log Yc + 2<riogj,s) == 129.35 per cent; 

antilog (log Yc — ^cr\ozv^ “ 58.94 per cent. 

antilog (log Yc + Scrjogy^) = 157.44 per cent; 

antilog (log Yc — ^cfiosv^ = ^8.43 per cent. 

In a similar manner zones of scatter can be obtained for other values 
of X. We can also, however, express the standard error of estimate in 
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Chart 224. Production and Price of Late Cabbage and Zones of Scatter. Estimating 
equation: Log Yc = 6.239331 - 2.149117 log X. (Data of Table 164 ) 

the form of a scatter ratio by obtaining the anti-log of (Tiogj,^. Since <Tiogy^ 
= .085340, plus one scatter ratio is 1.2171; and since —criosy^ = —.085340 
= 9.914660 — 10, minus one scatter ratio = .8216 (the reciprocal of 
1.2171). These results indicate that, regardless of the value of y<7, 

antilog (log Yc + o'logvJ = 1.21717c; 
antilog (log Yc - triogys) = .82160Fc. 

When Yc = 87.32, as in the illustration above, 

antilog (log Yc -b = 1.2171(87.32) = 106.28 per cent; 

antilog (log Yc — crjog^.^) =• .82160(87.32) = 71.74 per cent. 
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These are the same values that were obtained above. The ratio used in 
obtaining antilog (log Yc + is the antilog of or, the ratio 

may be obtained directly by squaring one scatter ratio. The other ratios 
may be similarly obtained. 

The coefficient of curvilinear correlation is usually referred to as the 
index of correlation^ and may be designated by the symbol p to dis^ 
tinguish it from the ordinary coefficient of correlation; may be called 
the index of determination. No positive or negative sign is set before p, 
though in the case of the cabbage data it might seem logical to consider 
the sign negative. However, for certain non-linear relationships, the slope 
is positive in some parts of the curve and negative in others. Illustrations 
of such curves will be found later in this chapter. 

pioeYiogx is easily obtained by substituting in the formula: 

2 ^ sgog ycr ^ z(iog Yc^ - (T5iT)2iog r 

P logF logx = 2(iog 2/)2 SGog Y)^ - aog F)21og Y 

where (log Y) is the mean of the log Y values. 

, 60 752045 - (1.941652)(31.066433) .431843^ 

oio^riog.T '60.868566 - (1.941652) (31.066433) ” 548364’ 

= .7875. 


piogTiogx »= .887. 

Using the logarithms of the X and F observations has increased the corre- 
lation from r = — .862 to p — .887. 

Of course, p^iogviogx tells us the proportion of variation in (or variance 
of) the logarithms of the F values that has been explained by reference 
to the logarithms of the X values. It is the ratio of the variation in the 
computed log F values to the variation in the actual log Y values. 

Also, piogFiogx can be obtained directly by the following expression, 
which parallels the formula used for r: 

jyS log X log F ~ (S log X)(X log F) 

piogi-iogx V[ArS(log Z)2 - (S log X)2][JVS(log 7)2 - (S log Y)2] 

16(61.923903) - (31.995866) (31.066433) 

“ •\/[16(64 076962) - (31.995866)2][16(60.868566 - (31.066433)2] 
= .887, 

This type of formula cannot be used when there are more than two con-* 
stants in the estimating equation. 

Use of reciprocals. Since the price of cabbage decreases as production 
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increases, it is not unreasonable to hypothesize that the relationship is 
reciprocal — that it may be described by an equation of the type^ 

i-.+K. 

Reciprocals of the Y values were obtained from Appendix 0. These 
have been plotted against the X values in Chart 225. The results seem 
to discredit the reciprocal hypothesis. The values for 1922 and 1932, 
which were slightly too low, in Chart 222, now seem much too high. Al- 
though the relationship seems linear, except for these two observations. 

TABLE 165 

COMPIFTATION OF VALUES UsED IN DETERMINING RECIPROCAL MEASURES OF ReLA-' 

TioNSHip Between Production (X) and Real Price (Y) of Late Cabbage, 1920-1935 

(For convenience in computation the X and Y variables, which are ratios to trend, have been consid- 
#‘red as decimals rather than percentages ) 


Year 

X 

1 

7 


X2 

©■ 

1920 

1393 

2 083333 

2 902083 

1.940449 

4 340276 

1921 

686 

536769 

368224 

.470596 

288121 

1922 

1083 

1 865672 

2 020523 

1 172889 

3 480732 

1923 

810 

940734 

761995 

656100 

.884980 

1924 

1.137 

1 290323 

1 467097 

1 292769 

1 664933 

1925 

.991 

867303 

859497 

.982081 

752214 

1926 

1025 

1 113586 

1 141426 

1.050625 

1.240074 

1927 

1.180 

1 533742 

1 809816 

1 392400 

2.352365 

1928 

873 

705716 

616090 

762129 

.498035 

1929 

910 ! 

846740 

770533 

828100 

.716969 

1930 

1,006 

1 152074 

1 158986 

1.012036 

1 327275 

1931 

962 

1 152074 

1.108295 

.925444 

1 327275 

1932 

1.086 

2,109705 

2.291140 

1 179396 

4 450855 

1933 

.775 

.530786 

411359 

600625 

,281734 

1934 

1217 

1 547988 

1.883901 

1 481089 

2 396267 

1935 

1 101 

1 663894 

1 841931 

1.212201 

2.768543 

Total 

16 235 

19 940439 

21.412896 

16.958929 

1 

28 770648 

i 


Source of data: Table 164 


it is not apparent that the correlation is higher than for the arithmetic 
relationship, and it seems lower than for the logarithmic relationship. 

2 Alternately, the type Y a == a + 6^ could be used: If the Y values of this illustr&r 

tion are plotted against the reciprocals of the X values, it will be noticed that the re- 
lationship between the variables is linear, and is much closer than that shown in Chart 

225. Nevertheless, the equation type ^ — a + was chosen for purposes of expo- 
sition, since this equation involves a problem in connection with the standard error of 
estimate not encountered with the former type. 
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It is possible to correlate the X values with the y values ia a 

fashion precisely similar to that with which the reader is familiar. The 
normal equations are of the following type: 

I. = Na + b2X; 

11. = aSX + &SX2. 

RECIPROCAL OF 
REAL PRICE 
2.2 

2.0 

U8 

1.6 

1.4 

i.2 

1.0 

.8 

.6 

60 70 80 90 100 110 120 130 140 

PRODUCTION 

Chart 225. Production and Reciprocals of Price of Late Cabbage and Zones of Scatter. 
Estimating equation: ~ = -1.219147 + 2.429738X. (Data of Table 165.) 

I a 

The necessary computations are shown in Table 165. Substituting the 
ralues obtained, we have: 

I. 19.940439 = 16a + 16,2356; 

II. 21.412896 =• 16.2350 + 16.9589296. 

The estimating equation is found to be 



Yo 


-1.219147 + 2.429738X'. 
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Continuing in the usual fashion, we have for the explained sums of squares: 

= (-L219147)(19.940439) + (2.429738) (21 .412896) 

= 27 717401. 

We now obtain the standard error of estimate as follows: 



28.770648 - 27.717401 
16 


.065828. 


Vs 


.2566. 


Chart 225, on which are plotted the reciprocals of the Y values, shows 
the estimating line and zones of while Chart 226 is the 

Va Vs Vs 

same except that natural numbers are plotted along both axes. A word 
of explanation is advisable concerning the method of obtaining the zones 
of scatter in Chart 226. For illustrative purposes we shall find the Yc 
value necessary for plotting when production is 100 per cent. The Yc 
values corresponding to other X values are obtained in a similar fashion. 
If X = 1.00, substituting in the estimating equation 

X = -1.219147 + 2.429738Z, 

X c 

we find that 


^ = 1.210591, and Yc = 82.60 per cent. 
X c 


- <Ti = 1.210591 - .2566 - .953991. 


i + cTi = 1.210591 + .2566 = 1.467191. 

•he yg 


Looking up the reciprocals of these values in Appendix 0, we find 


reciprocal 

reciprocal 



104.82 per cent, and 
68.16 per cent. 


It should be observed that the reciprocal values are combined before the 
final result is obtained. 
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reciprocal — 2cri^ = 143.39 per cent, and 

reciprocal fy + 2(rj^\ = 58.01 per cent; 

yj 

reciprocal - Z<s ^ = 226.86 per cent, and 
reciprocal ) = 50.50 per cent. 

N ^ Vs/ 
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The zones of scatter for other values of X may be obtained in a similar 
fashion. The width of each zone will differ for each value of X, as can 
be seen by an inspection of Chart 226. 


REAL PRICE 



Chart 226. Production and Price of Late Cabbage and Zones of Scatter. Estimating 
equation: ~ * -1.219147 + 2,429738X. (Data of Table 165.) 

r a 

Just as r® may be thought of as the proportion of total variation in the 
original data that has been explained, so pf is the proportion of total 

variation in the reciprocals of the original data that has been explained. 
That is, it is the ratio of the variation in the computed reciprocals to the 
variation in the reciprocals of the original data. Thus 
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where is the mean of they values. 

2- _ 27.717401 - 24.851310 
28.770648 - 24.851310 

^ 2866091 ^ 

3.919338 

Pi = .855. 

As stated before, p has no sign. In the present instance, although the 
line of relationship is a linear fit in terms of reciprocals of Y and has a 
positive slope in terms of reciprocals (see Chart 225), yet, when the com- 
puted values are re-converted into the original units, the slope becomes 
negative. 

When there are only two constants in the estimating equation, p i can 
be computed directly as usual: 

_ ivs(zi) - SX2(|) 

^[iVSX^ - (SX)2]|^Ns(y)' - (^2^)' 

16(21.412896) - (16 235) (19.940439) 

V[16(16.958929) - (16.235)2][16(28.770648) - (19.940439)^] 

= .855. 

The choice between the different methods of expressing relationship ia 
not always clear. In this instance the coefficient of correlation is highest 
when the relationship is assumed to be logarithmic (p = .887), and smallest 
when it is assumed to be reciprocal (p = .855). However, the differr^we 
is not very large, and for purposes of estimation other factors must be 
taken into consideration. 

By which method is the error of estimate reduced to smallest magnitude 
in absolute terms? As can be seen from Table 166, the range of error to 
be expected for 68.27 per cent of the items is 43.88 per cent for each value 
of X, by the arithmetic method. By the reciprocal method, the error is 
smallest for high values of X, but by far the largest for small values of X. 
When X is 60, the range is infinity. Another consideration is the distri- 
bution of the scatter around the different estimating lines. The distribu- 
tion appears to be more nearly normal around the line of section B of Chart 
223 than is the case with Chart 222 or Chart 225. Probably the loga- 
rithmic curve is the best of the three under consideration. 
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All three methods are consistent with the law of demand commonly set 
down by economists. The logarithmic method also involves the assump- 
tion that the flexibility of price^ is the same, regardless of market supply. 


Curves with More than Two Constants 

Second degree curve. It is well known that the per capita expense of 
city administration increases with the size of the city. For instance, in a 
large city many policemen are required to regulate trafflc. Congested 

TABLE 166 

Range of Standard Error Involved in Three Different Assumptions Concern' 
iNG Relationship op Production and Real Price op Late Cabbage, 1920-1935 

(Per cent) 


X 

Linear 

Logarithmic 

1 Reciprocal 

Fc 

60 

184 84 

261.74 

418 94 

100 

99 14 

87,32 

82.60 

140 

13.44 

42.37 

45.82 

F C ± OTy^ 

60 

162 90-206 78 

215.05-818.58 

201.90- 00 

100 

t 77.20-121 08 

71 74-106.28 

68 16-104.82 

140 

-S 50-35 38 

34.81-51 57 

41 00-51.92 

Range from — to -f 

60 

t 

43 88 

^ 103.53 

00 

100 

43 88 

34.54 

36 66 

140 

' 43 88 

16.76 

10.92 


Source* Derived from Tables 164 and 165 


areas are a breeding ground for criminals, and the opportunity for crime 
is more prevalent than in rural areas. As an illustration of the tendency 
of police expense per capita to increase with population of city, we use 
those cities between 50,000 and 300,000 population in 7 selected mid- 
western states. It was necessary to choose states which as a group were 


3 The equation Fc “ 1,735,128X”®*^^®^^^ indicates that the flexibility of price is 
—2.149117. For a general method of determining flexibility of pnoe at any value 
of X, see Appendix B, section XXIII-1. 
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fairly homogeneous; otherwise the tendency would be obscured. Also, it 
was deemed advisable to omit certain places (for example, Cicero, Illinois) 
which are in such close proximity to much larger cities that their police 
problem is closely tied up with the larger places. Only 17 cities are in- 
cluded in our sample. The results are therefore not very reliable, but the 
smallness of the sample facilitates the illustration of the application of 
the method. 

From Chart 227 it is obvious that the relationship between the variables 

PER CAPITA EXPENSE, 

DOLLARS 



Chart 227. Poptilation and Per Capita Police Department Expense of 3 7 Mid-West- 
ern Cities, 1925. (Group III and Group IV Cities of Wisconsin, Minnesota, Illinois, 
Iowa, Kansas, Nebraska, and Missouri. These cities range m population betweeu 
50,000 and 300,000. Kansas City, Kansas; East St. Louis, Illinois; Cicoro, Illinois; 
and Oak Park, Illinois, are omitted, since they are part of metropolitan areas of much 
larger cities. Data from Bureau of the Census, Financial Statistics of Cities^ 1925 ) 

is curvilinear and that if an equation of the type Yc — a + hX + cX^ is 
used, b will be positive and (since the increment of increase is decreasing) 
c will be negative- The normal equations for a curve of this type are 
three, since there are three constants: 


I. SF - ATa + 62Z + 

IL SXF » aSZ + + cSXS; 

III. ZX^Y - a2X2 + 52Z3 + c2Z^. 
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The computation of values required for these equations is given in Table 

167. 

Perhaps the simplest way* to solve these three equations simultane- 
ously is: 

(1) Solve equations I and II simultaneously so as to eliminate a, and 
thus leave a resulting equation A. 

I. 34.55 = 17a 1,564.9b + 195,852.89c, 

II. 3,504.60 = 1,564.9a 195,852.89b -h 32,660,157.11c. 

Multipl 3 dng I by 920.52941 : 

I. 3,180.429 = 1,564.9a + 144,053.65b + 18,028,035c 
II. 3,504.600 = 1,564.9a 4- 195,852.89b + 32,660,157c 

A. 324.171 = 51,799.24b + 14,631,322c. 

(2) Similarly, solve equations II and III simultaneously so as to elimi- 
nate a. Call the resulting equation B. 

B. 46,531,23 = 8,148,461b + 2,430,267,000c, 

(3) Solve equations A and B simultaneously so as to eliminate b. Thus, 
multiplying A by 157.30850 : 

A. 50,994.85 = 8,148,461b -f 2,301,631,000c 

B. 46,531.23 = 8,148,461b -f 2,430,267,000c 

4,463.62 = - 128,636,000c 

c = -.0000346996. 

(4) Substitute the value of c in either equation A or equation B. 

B. 46,531.23 = 8,148,461b -f (2,430,267,000) (-.0000346996) 
b = .0160595. 


*■ Simultaneous solution of three equations may be avoided by the followmg pro- 
cedure. Set up normal equations I and II as follows: 


I. 


a ^ 


SF - I'ZX - cZX^ 
N 


2XF - hXX^ - cSZ» 


Then substitute these expressions in equations II and III respectively: 

II'. SXF = 2Z + bSZ» -b cSX»; 

III'. SX!*F *= ~ 2X2 + 6SXS + c2Z*. 

These two equations are then solved simultaneously for b and c, after which a is ob- 
tained by substitution of 6 and o in one of the normal equations. 



I’ABIiE 167 

CoMPtTTATlON OF YaUUMS BbQXTIBED FOB MEASUREMENT OP RELATIONSHIP BETWEEN POPULATION AND PER CAPITA POLICE DEPARTMENT 

Expense op 17 Mid-Western Cities, 1925 



CDC0CDC0OTHCS05*0'<NHTH'^05rHT-HCCiC0 

^1>05COO«0<MO(NcO;^^^C3^tmcO 

CDrHCO»OrHCMlLOOOCDa3D00505COO:>COt^ 

OCDlOOSTjHCOrHOOO'^OOCOO(NJ>T-tO 

00r'iO*O'^TH»OC0iQCMrt<r-i<M'~i'^'^C^ 

74 0683 


171,865.44 
123,811 50 
46,719.72 
29,793 01 
16,783 96 
13,849 80 
13,917 12 
11,528 93 
13,064 49 
7,241 60 
8,715 89 
4,333 38 
5,367 31 
3,141 57 
6,082 27 
5,285 25 
3,643 33 

485,144 57 

XY 

696 640 
584 568 
332 052 
269.620 
187 740 
169 728 
177 741 
150 705 
171 450 
106 966 
138 788 

71 508 

96 361 

59 052 

115 413 
103 836 

72 432 

3,504 600 


C0C0<OL000r-((MO»5b-^0C-“XO*.C>0 
ccj c^^oo Tj^^co C50i> lo 00 

o"l>'cCri> cxTrf' COCOiSiScS'oi'c^r-^ 
00'JOOa)l>COCO'ci^r-fOlOOOCMrHrHrHO 
T-^CO O O 00 r- O 

(M* C<r th" csrccrnr COr^lS coaioot><^ CO 

CDrHO5THCOrl^C0C0C0CMTHr-l 

CO O CO tH 

CO of 

6,517,803,858 

CO 

14,886,936.00 
9,501,187.03 
2,785,366 14 
1,349,232 62 
714,516 98 
543,338.50 | 
480,048.69 i 
447,69712 ! 
442,450 73 
310,288 78 
247,673.15 
222,545 02 1 

172,808 69 i 
150,568.77 1 
146,363 18 ^ 
131,872 23 1 
127,263 53 ! 

32,660,157 11 1 


O'^osiocococstorttCfj'^coos'^CirHcys 
OC<J'^C<lCOiOOO<N'^, CMOOCO'^cac<JOOO 

CO o CO d ci oc o cvi to -o CO ^ cvi Cb- o o 

— >-o cr. — 0* to ''t o X r*- c cc i'- cc ''C 

0^ O to — X X i-O Cr. CO -- X t'' ‘0 LO 

c ci 0^ ^ CO >0 LO CO CO c: 00 oi oi m 

CD tH tH 

195,852 89 

Per capita 
police 

1 department 
expense 

F 

i 

xHCDCO'<fOCi01>I^iOOOrHXCOi-HC5^'^ 

0<i(Mci(M<MC^d^C<iT-tCQ"4TMrHC<l(Ni-i 

m 

34,55 

Population 

(thousands) 

X 

OOOI>tO«^CDCO»DCMl>XcDl>{Ni>C5CO 

cDr4oOOrHXCDcdl><MO>-OCOO.IOO 

iXOOI-l'-l>-CDOcDtCtOiOtO»0 

^ 

1,664 9 

City 

St. Paul - 

Omaha 

Des Moines 

Duluth 

Wichita 

Peoria ... . 

St Joseph 

Rockford .... 

Sioux City 

Racine 

g Springfield 

§ Lincoln 

Topeka. . . ....... 

Decatur 

Davenport . 

Kenosha ... 

Cedar Rapids . . ..... 

Total . ... ..... 


lO 

CO 

N 

CO 

o 

CM 


6 Q 


11 

lliH 


6 


cc 

T 3 


g- 


{S 




I 


s 



II II II II 
uoo»^ Jxh 

TtH 

CO-cH 
Or-i 
lO V ^ 
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(5) Substitute the values of h and c in equation I, II, or III. 

I. 34.55 = 17a + (1,564.9) (.0160595) + (195,852.89) (-.0000346996) 
a - .953795. 

(6) Check accuracy by substituting values of a, 6, and c in either of 
the two remaining normal equations. 

IL 3,504.60 = (1,564.9) (.953795) + (195,852.89) (.0160595) 

+ (32,660,157) (- .0000346996) 

- 3,504.599 = 3,504.60. 

(7) The estimating equation is 

Yc = .953795 + .0160595Z - .0000346996X2. 

It should be noticed that six digits are included in the values of a, b, 
and c, though perhaps only five are significant for a. This necessitates 
more than six digits in the various equations necessary to obtain these 
values. The reason this is true is that the various multiplications required 
multiply the inaccuracies inherent m rounding. In general, however, it is 
better to show too many digits than too few. As the computations pro- 
ceed toward their final conclusion, figures that lose their significance may 
be dropped. 

From the estimating equation, the desired Yc values may be computed 
as shown below. Values within the range of data only have been included, 


A 


a -f- bX 


Y 

{a + bX ■+ cX^) 

50 


1.756 

- .087 

$167 

100 


2.559 

- 347 

2.21 

150 



- 781 

2 58 

200 


4 166 

-1.388 

278 

250 


4.969 

-2169 

2.80 


since in themselves the original observations afford no evidence beyond 
their range. Furthermore, in this instance it seems illogical even to hy- 
pothesize that the equation will be useful if extended far in either direc- 
tion. Notice that if unduly extended the equation implies that a town 
without population would spend $.95 per capita on police, and that per 
capita expense would eventually decline, and even become negative, with 
increased size of city. 

The formula for is of the same type that was used in linear correlation : 

, 2t/| SF2 - S7g ^ 27^ *- (a2F + 52X7 + cZX^F) 

N ^ N iV 
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Proof that SF| = oSF + 6SZF +■ chX'^Y is similar to that shown for 
equation 3 in Appendix B, section XXIII-1. 

From Table 167 we see that SF^ = 72.4014; hence 


74.0683 - 72.4014 


= $.313. 


= .0985, 


We may now proceed to find p in the usual fashion. 


^ ~ 'Zy^~ 2F2 - FSF 

_ 72,4014 - (2.03235) (34.55) ^ 2.18371 ^ 

74.0683 - (2.03235) (34.55) 3.85061 ’ 

p = .753. 

As before, p has no sign. 

Test of fitness of equation type. Although the reliability of a curvilinear 
correlation coefficient will be considered in the final section of this chapter, 
it is worth while to consider at this point whether the increase in the corre- 
lation by the introduction of an additional constant in the estimating 
equation is a significant increase. 

Now it has been found that 


2.18371 

3.85061 


.5671; and p = .753. 


But use of the straight line equation gives these results: 


2.02893 

3.85061 


= .5269; and t = -(-.726. 


(In the above expression, 2.02893 = Xyi for a linear equation the com- 
putation of which is not shown. It may be obtained from the data given 
in Table 167.) We may discover whether p is significantly higher than r 
by application of the analysis of variance technique which was outlined in 
Chapter XIII. 

We may summarize our results from using a straight line estimating 


equation as follows: 




Sotirce of variation 

Amount of 
variation 

Degrees of 
freedom 

Variance 

Explained by straight line. 

2 02893 

1 

2 02893 

Unexplained by straight line 

1.82168 

15 

.12145 

Total 

3.85061 

16 

0,24066 
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From this summary we see that the squared residuals from the straight 
line total 1.82168. We may now inquire how much of this residual varia- 
tion is explained by the introduction of another constant into the estimat- 
ing equation. This is most easily done by: (1) subtracting 2.18371 (the 
variation explained by the second degree curve) from 3.85061 (^y^) in 
order to obtain the unexplained variation, 1 6690, after use of the second 
degree curve; (2) subtracting this amount (1.6690) from 1.82168 (the un- 
explained variation as measured from the straight line), giving .15478, 
the increment explained by the second degree curve. 

Let us summarize these results also: 


Source of variation 

! 

Amount of 
variation 

Degrees of 
freedom 

Variance 

1 

Increment explained by additional j 




constant in equation . . 

015478 

1 

0 16478 

Unexplained by second degree curve 

1.66690 

14 

11906 

Total unexplained by straight 




line 

182168 

15 

0.12145 


A word of explanation concerning the determination of the degrees of 
freedom will be given. There are 17 items. However, since total varia- 
tion is measured from the mean, one degree of freedom is lost. That is, 
arbitrary values may be assigned to any 16 of the 17 residuals, but the 
value of the other one is determined by the requirement that the devia- 
tions from the mean be zero. On the other hand, a straight line uses up 
an additional degree of freedom, or two altogether, since there are two 
constants in the equation (that is, a and 6). Thus there are 17 — 2 = 15 
degrees of freedom remaining for the residuals from the straight line, as 
shown in the first table. This leaves 16 — 15 == 1 degree of freedom for 
the yc values. The second table shows 14 degrees of freedom for the 
deviations from the second degree curve, since the latter uses up three 
degrees of freedom on account of the constants <r, 6, and c. Deviations 
from their own mean of computed values obtained by the use of a second 
degree curve have 2 degrees of freedom, but as indicated in the above 
table, this is only 1 degree of freedom in addition to the single degree of 
freedom possessed by the deviations from their own mean of the computed 
values of a straight line* 

Possibly the table on the next page will further clarify the nature of the 
different measures of variation into which the total variation has been 
divided. 
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Source of variation 

/jnount of 
variation 

Degrees of 
freedom 

Variance 

Explained by straight line . . . 

2.02893 

1 ; 

2 02893 

Increment due to additional constant 

.15478 

1 

15478 

Total explained by second de- 




gree curve 

2.18371 

2 

1 09186 

Unexplamed by second degree curve 

1 66690 

14 

.11906 

Total 

' 3 85061 

16 

1 

0 24066 

I 


We now want to ascertain if the explained variance attributable to the 
a.ddition of a third constant is significant in relation to the remaining un- 
explained variance. Thus 




.15478 

.11906 


1.300. 


Our F table (Appendix G2) indicates that, when ni - 1 and n 2 = 14, 
the .05 level of significance requires that F = 4.600. It is clear that the 
increase in correlation brought about by the addition of another constant 
to our equation is not significant. 

Estimate of population correlation. The best estimate of p for the popu- 
lation is 


;,2 _ 1 _ = P^(iy - 1) - (m - 1) 

^ N -m 

Using the first of the two expressions, since the variances needed are 
given above, 


p2 = 1 - 

p = .711. 


.11906 

.24066 


.5053. 


A similar correction for size of sample for r from the same data gives 
f = +.704. Although p is larger than f, the difference is not great. The 
result of this comparison should not be considered as conflicting with the 
analysis of variance test. The mere failure to establish a significant dif- 
ference does not prove that the difference is accidental. It is still our 
best guess that the relationship between the two variables is curvilinear, 
but the linear hypothesis is by no means discredited. 

Third degree curve. As an illustration of the law of diminishing returns 
we shall use data derived from experiments with nitrogen fertilizer and 
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tobacco yield at Tipton, Georgia. One thoasaiid pounds of fertiliiier pei 
acre were applied to five different plots. Of the active ingredients, phos- 
phoric acid and potash were held constant at 8 per cent and 5 per cent 
respectively; and the nitrogen was made to vary as follows: none, 2 per 
cent, 3 per cent, 4 per cent, 5 per cent. Presumably the experiment was 
so conducted that differences in yield were not attributable to differences 
in soil fertility, drainage, etc., between plots. The experiment was re- 
peated in three different years. Of the total variance, what proportion 
can be explained by the varying amount of nitrogen used? While it is 
possible that the experiment was not perfectly designed, the data indicate 


YIELD 
IN POUNDS 



Chart 228. Per Cent Nitrogen in Fertilizer and Yield Per Acre of Tobacco, at Tipton, 
Georgia. (Horizontal lines indicate average yield per acre for each percentage of 
nitrogen, while curve represents computed values from equation Yc = 890.32389 
+ 78.263630X + 20.323899X2 - 4.4648847X3, Data of Table 168.) 

almost perfect correlation when the relationship is assumed to be of the 
equation type 


a + hX + cX^ + dX^ 

This can be roughly verified by inspection of the scatter diagram, Chart 
228. The heavy horizontal Bnfas are the average yields for each of the 
percentages of nitrogen which are given. These means are not necessary 
for the solution of the problem, but are useful in discovermg the type of 
curve to fit. 
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Solviion of normal eguaiions. Since four constants must be foundj four 
normal equations of the following type must be used:® 

I. SF = Na + bSX + c2X^ + dSZ®; 

II. SZF = aSZ + + cSZ® + 

III. SZ2F = aSZ2 + 6SZ3 + cXX* + dSZ®; 

IV. SX®F = aSZ® + &SZ^ + cSZ® + dSZ®. 

The values required are computed in Table 168, and their substitutions 
result in the following normal equations: 

I. 16,934 = 15a + 426 + 162c + 672d; 

II. 60,630 = 42a + 1626 + 672c + 2,934d; 

III. 197,198 = 162a + 6726 + 2,934c + 13,272d; 

IV. 822,884 = 672a + 2,9346 + 13,272c + 61,542d. 

Following our previous procedure we may solve together equations I and 
II; IJ and III; III and IV, in each case eliminating a. We now have 
three equations: 

A. 48,222 = 6666 + 3,276c + 15,786d; 

B. 80,256 = 1,9806 + 14,364c + 82,116d; 

C. 790,152 = 23,7246 + 178,416c + l,051,020d. 

We may now solve together A and B ; B and C, eliminating 6. The equa- 
tions are thus reduced to two: 

D. -42,029,064 = 3,079,944c + 23,432,976d; 

E. -339,492,384 = 12,492,144c + 132,899,616d. 

Solving equations D and E simultaneously, we find that d = —4.4648847 
and c = 20.323899. By substituting these values in equation A, B, or C, 
we find that 6 = 78.263630. Substituting the values found for 6, c, and d 
in equation I, II, III, or IV, we find a to have a value of 890.32389. It 
is advisable to check the values of d, c, 6, and a at each step, since any 
error made in the early stages will vitiate all subsequent computations. 
One method of checking is to calculate each of the constants twice, by 
substituting in two different equations. Possibly even better is to sub- 
stitute all of the constants known at any tune m one of the remaining 
equations. For instance, if the value of o has been found by substituting 

® Had observations been taken for 1 per cent nitrogen, the origin could convenientlv 
nave been taken at the mean of the X values (2.5) . Then the sum of the odd powers 
X wo^d have been zero, and would have disappeared from the normal equation^ 
We should then have had two pairs of normal equations to solve simultaneously: 

1. nr <=Na+ cSX*; II, XXY * 6SX* + dSX*: 

m. SX»F = aZX^ + cSX*. IV. syzy = jsx* + d2X» 

The burden of computation would have been materially lightened. 



Computation op Values Required to Obtain Measures of Relationship Between Per Cent Nitrogen in Fertilizer and Vield 

PER Acre op Tobacco, Tipton, Georgia 

(Fertilizer is 1,000 pounds per acre, PzOs and K 2 O are 8 per cent and 6 per cent respectively The yields on all plots were unusually high m 1925; consequently they 
were reduced by a factor which reduced their average to the average of 1924 and 1926 > 
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f ^ 1,128 933 pounds. 
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values of h, c, and d in equation I, a final check may be made by substi- 
tuting a, h, c, and d in equation IV. Thus 

822,881 = 672(890.32389) + 2,934(78.263630) -f 13,272(20.323899) 

-b 61,542(-4.4648847) 

= 698,297.65 -f 229,625.49 + 269,738.79 - 274,777.93 
= 822,884.00. 

The estimating equation, then, is 

Yc = 890.32389 + 78.263630Z + 20.323899Z2 - 4.4648847Z3. 

Doolittle method. It must be confessed that, when there are as many as four equa- 
tions to solve simultaneously, the above procedure is somewhat laborious Further- 
more, no check can be applied until the value of d is obtained. Even that does not 
check the accuracy of any work except the solution of the two equations (D and E) 
necessary to obtain c and d. All of the preceding work could have been honeycombed 
with errors and still the solution of these two equations would check. It is not until 
all of the constants are obtained that we have any real check on the accuracy of the 
solution of the four normal equations. If the final check fails, aU of the work must 
be repeated 

Fortunately there is available for solvmg equations of this type simultaneously a 
systematic method that provides frequent checks on accuracy, and is less laborious 
than the above procedure when there are four or more equations It is known as the 
Doolittle method, having been developed by M H Doolittle. Like many labor-saving 
devices in statistics, the method at first seems very confusing. To a certain extent 
there is a substitution of complexity of procedure for repetitive drudgery. 

The Doolittle method is illustrated by Table 169. There are five parts to this table : 

Part 1. Normal equations. These are the same equations that are found on page 
714, but all of the terms have been put on the left side, so that each equation equals zero. 

Part 2. Forward solution. This solution obtains a value for d (—4 4648919, found 
in row IV', column C2), and provides the tgures with which to obtain values for the 
other constants. 

Part 3. Back solution. In this part we compute by a simple process the values, in 
turn, for c, and a. 

Part 4. Estimating equation Note that this equation agrees, to five digits, with 
the one previously obtained. 

Part 5. Check equation. By substituting the values of the constants obtained in 
the last normal equation, the preceding work is checked. This step involves nothing 
new. 

The entries in the forward solution are the most confusing, but if the procedure and 
explanation outlined below are followed very carefully, no trouble will be experienced 
in appl37ing the Doolittle method to the solution of equations of this type. It is de- 
sirable that work be done in pencil first This will permit some of the entries to be 
made in boldface, as indicated in Table 169, merely by converting the pencil figures 
into ink The steps in the forward solution are as follows: 

1. Divide the forward solution table into as many sections as there are normal equa- 
tions. Leave a space between sections, and separate also by a horizontal line as shown 
Allow in each section two more rows than the section number: except that section one 
requires only two rows, rather than three. 
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2. Label the columns: (1), (2), (3), (4), 0, and Check total Five constants would 
require five normal equations, and therefore a column (5) also Enter also the descrip- 
tive matter in the stub as shown in Table 169 

3 Record the appropriate normal equation coeJQ&cients in the first row of each sec- 
tion, being sure to indicate minus signs. 

4. Total each normal equation algebraically; record the results in the last column, 

5. Make the following entries in the last row of each section: 

1 00000000 in row I' column (1) ; 

1.0000000 in row 11' column (2); 

1 000000 in row III' column (3) ; 

1 00000 in row IV' column (4). 

The number of zeros after 1 indicates the minimum number of decimal places to carry 
computations in each section. The reason for dropping an additional decimal place as 
computations proceed from section to section is that errors from rounding the figures 
cumulate, and the number of significant places becomes smaller. It is advisable, 
however, never to record fewer than eight digits, including the decimal places. 

6. Row I' is the result of dividing row 1 by the number in cell SI(1) and changing 
signs. The sum of the first five entries in this row should be checked against the entry 
in the total column, and agreement indicated by a check mark. Values in columns 
(2), (3), (4), and Q of this row should be entered in boldface, as further use is to be 
made of them. (As suggested above, this is most easily done by reinforcing the origmal 
pencil entries with ink.) 

7. The entries in the second row of section II, which is labeled SI X I' (2), are a 
result of multiplymg the items in row SI by the number (in boldface) in the cell which 
is an intersection of row I' and column (2). In similar fashion, immediately below 
each row of normal equation coefficients are found the corresponding '' product" rows 
These rows are called product rows because they are the result of makmg multiplications, 
a description of each such operation being given in the stub of the table. It helps to 
keep the process straight if we observe that the multipliers are always the boldface 
numbers in the column bearing the same parenthesized number as the section being 
computed; and that the numbei's multiplied are those in the row immediately above the 
boldface number in question A check on the accuracy of these entries is afforded by 
totahng each row as it is computed, and indicating by a check mark agreement with 
the entry in the total column. 

8. The third row of section II, labeled SII, is the result of adding algebraically 
the two rows above it in that section. Likewise the S row in each section is a vertical 
summation of aH the entries above the E row in the section in question There is no 
separate X row in section 1, since the section has no product row, and therefore the 
normal equation row automatically becomes also the S row. Note that, as the com- 
putations proceed from section to section, there is an increase in the number of spaces 
in this row that are left vacant because the entries have become zero. These X rows 
also should be added horizontally to obtain a check with the total column* 

9. Row II' is the result of dividing row Xll by the value in SI1(2) and changing 
signs. So also each prime” row (III', IV', etc ) is obtained by dividing each item in 
a given X row by the first entry in that row, with sign changed. It is because of this 
fact that the first entry is always —1. This entry is perhaps a sufficient description 
to remind us of the nature of the operation. The prime rows should also check with 
the total column. After the check has been made, enter the numbers to the right of 
each —1 in ink, up to, but not including, the total column entry. 
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The preceding explanation has referred specifically to the steps involved in sections 
I and II. The other sections are computed in similar fashion, each section requiring 
the previous computation of the other sections The only variation among the differ- 
ent sections lies in the number of product rows and the number of vacant spaces to the 
left in some of the rows As previously noted, we have obtained (in cell IV' Q) the 
value of d which is —4 4648919 We are now ready to proceed with the back solution 
to obtain a, h and c. 

The hack solution occasions no difficulty It consists merely in substituting the values 
of the constants, as obtained, in the derived equations III', 11', and I'. The entries 
in the 1 column are the boldface items in column 0 of the forward solution table. The 
item in the last row of this column ( -4.4648919) is d This value is recorded in the 
last row of the total column The entries in the d column are the boldface items of 
column (4), above, multiplied by —4 4648919 (the value of d). The sum of the 
Items in the third row is c (33 970002 — 13 646047 = 20.323955), which is entered 
in the total column, opposite c. The entries in the c column are the boldface items 
of column (3), above, multiplied by c. The sum of the items in the second row is h 
The entry in the h column is the boldface entry in column (2), above, multiplied by h 
The sum of the items in the first row is a. It will be noticed that, in using the back 
solution table, we record the column to the right first and then proceed to the left; 
and in the total column we proceed from bottom to top. Proceeding in this fashion 
is rather unusual, but most convement in this case. 

The estimating equation arrived at by the Doolittle method, 

Yc = 890.32391 + 78 263524Z -f 20.323955X2 - 4 4648919X3, 

agrees to at least five digits with the equation previously obtained on page 716. 

In the right hand column of the Doohttle back solution table is provided a convenient 
place for computation of the explained sum of squares by the usual expression 

== aSF + 52X7 + cSX2F + d^X^F, 

Note also that SF, SXF, and SX^F (with signs changed) are found in 

column Q of the forward solution table, the first row of each section, in that order 
from top to bottom; while a, 5, c, and d are arranged, in corresponding order, in the 
left-hand part of the back solution table. The computations show that 

XY^ = 19,372,982. 

Using tlie equation previously obtained 

Yc = 890.32389 + 78.263630X + 20.323899X2 - 4.4648847X3, 
the computation of Fc values is as follows: 


X 

a + hX 

1 cX2 

dX3 

Yc 

(pounds) 

0 

890.324 

0 

0 

890 32 

1 

968.588 

20.324 

- 4.465 

981.45 

2 

1,046.851 

81.296 

- 35.719 


3 

1,125.115 

182 915 

-120.552 

1,187.48 

4 

1,203.378 

325.182 

-285.753 

1,242.81 

5 

1,281 642 

508.097 

-558,111 

1,231.63 
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As can be seen from Chart 228, there is a point of inflection at about 
per cent nitrogen, and the curve reaches a maximum of nearly 1,250 pounds 
shortly after the nitrogen reaches 4 per cent. These are, respectively, the 
points of di min ishing marginal returns and diminishing total returns. How 
to locate these points more exactly is explained in Appendix B, section 
XXIII-2. 

= oSF + 6SZF + cSX^F + dSX^F 

■= 890.32389(16,934) + 78.263630(50,630) + 20.323899(197,198) 

-4.4648847(822,884) 

= 19,372,981. 

Values of and p are obtained in the usual fashion: 

, SF2 - (aSF + 5SXF + cSX^F + dSX^F) 

SF2 - SF2 19,377,528 - 19,372,981 4,547 , 

N ~ 15 15 

(Ty^ = 17.41 pounds. 

, SFg - FSF _ 19,372,981 - (1,128,933) (16,934) 

P SF2 - FSF 19,377,528 - (1,128,933) (16,934) 

19,372,981 - 19,117,357 _ 255,624 _ 

19,377,528 - 19,117,357 260,171 

p = .9912. 

Grouped data. As an illustration of fitting a second degree curve to 
grouped data we shall take the relationship found to exist in East-Centra] 
Illinois between the yield per acre of broom corn and the man hours ex- 
pended per ton in harvesting the crop. The data are shown in Table 170 
and have been plotted in Chart 229. The horizontal line in each column 
is its mean. Inspection of the position of these lines reveals that labor 
costs decline rapidly at first as the quality of the land improves, but even- 
tually tend to become constant. Although the use of reciprocals or loga- 
rithms might yield good results, for purposes of illustration we shah use a 
curve of the type 

Yc a + bX + cX^, 

Examination of Chart 229 shows that h will be negative and c positive. 
To facilitate computation, we may designate the origin to be X = 633.33, 
Y = 112.50; and we shall compute the equation first in terms of class 
intervals. The X interval is 66.67, and the Y interval 25. With that 
origin and in those units, the estimating equation takes this form: 
d^Yc = a + hd'x + cid'xY, 



MAN HOURS PER TON (F) 


TABLE 170 

CoRREIiATION TaBLE FOR COMPUTATION OF "V ALUES REQUIRED FOR MEASURES 
OF Relationship Between Tons per Acre and Man Hours per Ton Re- 
quired IN Harvesting Broom Corn 

tons of broom corn per acre (X) 


Class 

limits 


133 34 
to 

199 99 

200 00 
to 

266 66 

266 67 
to 

333 33 

333 34 
to 

399 99 

400 00 
to 

466 66 

466 67 
to 

533 33 


Mid- 

value 

166 67 

233 33 

300 00 

366 67 

433 33 

500 00 

250 00 
to 

274 99 

262 60 

-42 294 

1 

-42 294 

-36 216 

1 

-36 216 





225 00 
to 

249 99 

237 50 







200,00 

to 

224 99 

212 50 







175 00 
to 

199 99 

187 50 


-18 108 

1 

-18 108 


-12 48 

1 

-12 48 

-9 27 

1 

-9 27 


150 00 
to 

174 99 

162 50 



-10 50 

1 

-10 50 


-6 18 

1 

-6 18 


125 00 
to 

149 99 

137 50 




-4 16 

3 

-12 48 

-3 9 

4 

-12 36 

-2 4 

3 

-6 12 

100 00 
to 

124 99 






0 0 

1 

0 0 

0 0 

7 

0 0 

75 00 
to 

99 99 

87 50 




i 


2 -4 

2 

4 -8 

50 00 
to 

74 99 

62 50 







25 00 
to 

49 99 

37 50 



1 





X 

-7 

-6 

-5 

-4 

m 

-2 


49 

36 

Bi 

16 

9 

4 

fx 

1 

2 

■ 

1 ^ 

7 

12 

fxd'x 

-7 

-12 

-5 

_16 

-21 

-24 

fxid'x)^ 

49 

72 

25 

64 

63 

48 

/xW'x)® 

-343 

-432 

-125 


-189 

-96 

/xW'x)* 

2,401 

2,592 

625 

1,024 

567 

192 


Source: See Chart 229, 


722 


















TABLE 170 (Continued) 



723 











724 


NON-LINEAR CORRELATION 


[Chap. 23 


where, as in earlier chapters, d' refers to a deviation from an arbitrary 
origin in terms of class intervals. 

The correlation table shown as Table 170 is an extension of the form 
used in simple correlation (Table 161). It is slightly more complex on 
account of the additional values which must be computed in order to 
obtain the constants for a second degree curve. Specifically, the following 
additional values must be computed: 

MdW'y 

The numbers in the center of each cell indicate the number of farms 
that fall within the different cell boundaries. The upper left-hand corner 
of each cell represents the products of d'xd'vf as in a simple correlation 
table. The first and third quadrants are 
positive, while the second and fourth are 
negative. The (d row is useful in order 
to obtain the (dx)^d V products, which are 
recorded in the upper right-hand corner of 
each cell (note insert opposite). As the 
reader can easily verify, the first and second 
quadrants must be positive, and the third 
and fourth negative. The values in the lower left-hand and lower right- 
hand corners of each cell are the fd^xdr and /(dx)^dV values respec- 
tively. They are obtained by multipl 3 dng the numbers in the upper cor- 
ners by the cell frequency. Finally, in the extreme lower right-hand 
corner of Table 170 are recorded 2/(i zd r, the sum of the fd'xd'y values; 
and also ^fid'xYd'y, the sum of the values which were recorded in the 
lower right-hand corner of each cell In obtaining these totals, care must 
be exercised to add algebraically— that is, to add the plus values and sub- 
tract the minus values. 

This general type of correlation table can be extended for use with curves 
of higher degree also, but the added complexity and additional space re- 
quired ultimately limit the practicability of the device. 

The normal equations for the estimating equation are: 

I. S/rdV = Na + h'Zfxd'x + oZfxid^xY, 

IL 2/c?xdV = oiZfxd^x + &S/x(dx)^ + cS/x(dx)®; 

III. S/(d'x)" dV = aS/x(d'x)^ + V^fxid^x)^ + c2/xtoL 

Making substitutions from Table 170, we have: 

I. -16 = 103a - 326 + 482c; 

11. -238 = 32a + 4826 - 1,058c; 

III. 662 == 482a - 1,0586 + 8,630c. 
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From these normal equations the estimating equation is 

d'y^ = -.55290435 - .40268355<i'x + .058222561(0^)2. 

The origin of this equation is Xd, Yd, and the units in which it is stated 
are X intervals and F intervals — that is, the origin is X = 633.33, F =^' 


MAN HOURS 
PER TONI 



^ .5658420X -f .0003275019X2. Data have been adapted from a chart on page 27 of 
An Economic Study of Broom Corn Productionj by R. S. Washburn, and J. H. Martin, 
U. S. Department of Agriculture, Technical Bulletin, No. 349, February 1933, p. 27.) 

112.50, the X units 66.67 tons, and the Y units 26 man hours. Ya values 
may be computed directly from this equation,^ as shown by Table 171. 

® If preferred, the formula can be transformed into the usual form. In this case it 
becomes 

Yc - 325.6794 - .5658420X + .0003275019X2. 

See Appendix B, section XXIII-S 
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There is some doubt concerning the economic validity of this equation 
type for these data because of the fact that the curve begins to turn 
imperceptibly upward within the limits of the data. 

To obtain ay^j we use the expression 


^5/r, 




^frid'Y)^ - ^frid'yy 


N 


where 2!/r(d = aS/y d'y + blhfd'zd'y + cl^f(d'x)H'y. 


Substituting, we find 

'Efyid'y^y = (-.55290435) (-16) + (-.40268355) (-238) 

+ (.058222561; (662) 

= 143.22849; 

and 


(T,, = 25 


220 - 143.22849 
103 


= 21.58. 


p is perhaps most conveniently obtained as follows: 

{^fyd'yY 


^fy(.d'y)^ — 


p2 = 


N 


^fyid'y^ - 

( 16)2 


143.22849 - 


220 
p = .804. 


103 


103 140.74305 

" 217.51456 


.6471. 


Use of Means 

Frequently it is difficult to decide, on theoretical or other grounds, the 
type of equation to choose. In such cases the statistician may select as 
a description of the relationship the average value of the dependent vari- 
able corresponding to different values of the independent variable. The 
computation of the measure of degree of relationship, usually called the 
correlation ratio, tj, follows the same principle which we have hitherto 
used. It is the square root of the proportion of the total variation that 
has been explained by the variation of these average values. Two illus- 
trations of procedure will be helpful. 

A simple illustration. The nitrogen data provide an illustration that is 
simple in two respects. First, the Yk values (that is, the column means) 
to be computed have been deternoined by the design of the experiment. 
We must compute the average yield when the per cent nitrogen is 0; 2, 
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3; 4; 5. Second, there are exactly three observations from which to com- 
pute each mean. The chief computations are shown in Table 172, which 
is divided into five main boxes (one for each value of X) and a total colunm. 

The sums of the Y values corresponding to each percentage o£fertilizei 
are recorded in row 1. Average yields are in row 2. These Yk values 
are the explained values and, as may be seen from Chart 228, vary only 
slightly from the Yc values previously computed by use of the equation 

m 

Yc = a + IX + cX^ + dX^. The explained sums of squares XNkYk 

is the sum of the squares of the individual column means multiplied by 
the number of observations in each column, or may be obtained more 

m Nj^ 

easily perhaps by the expression TuYj^XY. This value is computed in 

row 3, and found to be 19,373,216. The explained variation is obtained 
by subtracting the usual correction factor FSF. 2F is shown at the 

right of row 1 to be 16,934; hence Y = := 1,128.9333, and the cor- 

rection factor is 1,128.93 X 16,934 = 19,117,357. The explained varia- 
tion then is 19,373,216 -- 19,117,357 = 255,859. The total variation is 
computed exactly the same as wdth the other types of correlation we have 
considered. 27^ is obtained from row 4. We may now find the measure 
of correlation:'^ 


rj 


2 

,2 — JL 


NK{YK-Yr 


m / \ 

f^KZY) - rsF 


S(F - Yf ~ SF2 - FSF 
19,373,216 - 19,117,357 _ 255,865 
19,377,528 - 19,117,357 260,171’ 

= .9834. 

7] = .9917. 


It has been shoim in Appendix B, section Xni-2, that 

•r 


S(r - F)2 = 2^Nz(j. 


S (7 - 7kY 




_ 1 

X - 7)H + S S 
2)7* - rS7 = s(Fic S - FS7 j + |^27=> - |(Fi S* 7^ j • 


That is to say, total variation equals variation between columns plus variation witlux! 
columns; or, total variation equals explained variation plus unexplained variation. It 
is therefore obvious that may also easily be computed as 1 — (the ratio of the unex‘ 
plained to the total variation). 

In this text we are describing which uses the means of the columns in computinf 
the explained variation Sometimes the mesns of the rows are used instead, in whicii 
case the correlation ratio is denoted by Although ryx “* vjyx ^ 
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The similarityj as well as the difference, between the correlation ratio 
and the analysis of variance technique as treated in Chapter XIII should 
be mentioned. In the earlier chapter the ratio was between the explained 
and the unexplained population variance estimates, while here the ratio 
is between the explained and the total variation (or variance, if preferred) 
of the sample. Analysis of variance may be used to determine whether 
there is any significant relationship between the two variables; 77 ^ is used 
to measure the degree of relationship — the proportion of variation in the 
dependent variable, which has been explained by the independent variable. 

Correlation in population. It should be recalled that for these data, 
p == 9912. This was increased to 77 == .9917 by the use of the five class 
means instead of the four constants involved in the estimating equation. 
Or^ in terms of squared deviations, p^ == ,9825 and 77 ^ — .9834. This does 
not necessarily imply that the line of means more nearly represents the 
relationship which would be found in the population from which this 
sample was drawn, than does the four-constant estimating equation. Each 
time we add a constant to an equation, or subdivide the data so as to 
obtain another class mean, we reduce the possibility of variation from that 
estimating line or line of means. Each added constant or mean sacrifices 
a degree of freedom. In the present illustration, p sacrifices four degrees 
of freedom, while 77 sacrifices five. Now, the more complicated the rela- 
tionship assumed, the less the variation which remains unexplained and 
the higher the apparent correlation. But since we have only a limited 
number of items in a sample, the results are misleading, for sacrificing a 
degree of freedom is equivalent to sacrificing an observation. When small 
samples are used, the scatter around the line of relationship is apt to be 
smaller than for the population, and therefore we shall tend to get corre- 
lation higher than exists in the parent population. However, we have 
seen that it is possible to make an estimate concerning the correlation 
which may reasonably be expected to obtain in the entire population, by 
allowing for the sample size and the degrees of freedom sacrificed. The 
easiest formula to use is 

f2 = - 1) - (m - 1) 

N -m 

where N is the number of items in the sample, and m is the number of 
constants in the equation or the number of class means — that is, the 
number of degrees of freedom sacrificed. This formula was discussed on 
page 679, and its derivation is shown in Appendix B, section XXII- 6 . 
It may be used not only for f, but for p, fj, and for the coefficient of mul- 
tiple correlation discussed in the following chapter.® 


® See Mordeoai Ezekiel, Methods of Correlation Analysis, pp. 121-122, 246; John 
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Using this formula, 

2 ^ .9825(15 - 1) - (4 - 1) 

^ 15-4 

p = .9888. 

-2 = -9834(15 - 1) - (5 - 1) 

15-5 

ft = .9883. 

Thus we see that, by using class means instead of an equation, there has 
been no improvement in our explanation of the relationship between the 
per cent of nitrogen in the fertilizer and the yield of tobacco. Apparently 
there has been a slight retrogression! 

Since the population estimate for p is greater than rj, no purpose would 
be served by testing whether or not 7] is significantly greater than p. The 
procedure, however, is essentially the same as was used on pages 710-712 
for testing the significance of the difference between p and r. We first 
discover the increase in the population variance brought about by using 
the line of means rather than the estimating equation. This explained 
variance is related to the unexplained variance by means of the zor F test. 
The unexplained variance is obtained from the deviations from column 
means. An example of this procedure will be given in connection with 
the next illustration. 

It is sometimes desirable to test whether or not the given data exhibit 
a significant departure from linearity. The procedure is exactly the same 
as that described above. The increase in explained variance is that 
brought about by the use of the column means instead of the straight line 
equation.^ In the present instance, however, there can be little doubt 
that the relationship between fertilizer and yield is non-linear. 

The correlation ratio can be used not only when the independent vari- 
able is quantitative, but also when it is qualitative. Simon Kuznets has 

Wiley and Sons, New York, 1930. The formula used above is the same as the one 
given by Ezekiel on p 121 , but different symbols are used and the formula has been 
put in a form which probably is slightly easier to use. 

® Another method of testing whether or not 17 is significantly greater than r is to 
compute 7}^ — and compare this with its standard error. 

V(1 - - (1 - + 1. 

The significance of this ratio may be roughly ascertained by referring to a table of 
normal curve areas. This test is not satisfactory, however, because it does not take 
into consideration the number of classes used in computing rj nor does the sampling 
distribution of rf — always follow the normal curve- 


M2S(^J . , 8777 . 

J . .9768. 



732 


NON--LINEAE COERELATION 


[Chap. 23 


suggested its use to test the validity of a seasonal index, which, it will be 
recalled, consists of means of columns of data, each column representing a 
separate month. This test is, of course, subject to the same limitations 
as is the analysis of variance test. These limitations were mentioned on 
pages 497'-498. 

Data grouped on both axes. The broom corn data used earlier in this 
chapter provide an illustration that is more complex in three respects 
than the nitrogen data: (1) The number of classes and their limits are 
determined, not by the design of the experiment, but by the judgment of 
the statistician; (2) the number of items in the different classes vary; (3) 
the data have been grouped on the basis of man hours per ton as well as 
yield. 

Since the data are grouped, we shall proceed to ascertain, first, the 
explained variation in intervals and, then, the total variation, also in 
intervals, will, of course, be the ratio of these twc quantities. Table 
173 is the computation table. As can be seen, there are thirteen main 
boxes in the body, one for each of the twelve classes, and one for the 
entire distribution. The box heading for each class indicates the class 
limits and mid-value of that class. The section for each of the twelve 
classes contains entries necessary to compute the class mean, while that 
for the entire series contains entries for computation of the standard 
deviation also. As the table shows, 112.5 man hours was arbitrarily taken 
as the origin of each column as well as for the entire Y series. Eow 1 is 
for the totals of the various colunms. Symbolically, these totals are Nk 

and X/ydY for columns corresponding to different X values; and N, XfrdY, 

and Xfrid'yy for the entire distribution. Rows 2 and 3 are necessary to 
obtain the explained sum of squares in intervals from the arbitrary origin. 
This value is 150.60065, and is recorded in the last column as the total 
of row 3. 

The explained variation in terms of intervals is obtained by subtracting 
a correction factor from the explained sums of squared deviations as meas- 
ured in deviations from the arbitrary origin. That is, 

x(Nk — —-) =S 

l\ i / 1 

= 150.60065 - = 148 . 11521 . 


See Simon S. Kmnets, Seasonal Variations in Industry and Trade, p. 34n, National 
Bureau of Economic Research, New York, 1933. 


S/rrfVj 


_ 

N 



TABLE 173 

CoMPXJTAHONS Required to CJompuete Correlation Ratio Between Yield per Acre op Broom Corn and Man Hours per 

Ton Required for Harvesting in East-Central Illinois 
(Y ield per acre in tons, X ) 
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The total variation in terms of intervals is^^ 



= S/r0V)2 


(Sfrd'Y)^ 

N 


= 220 - 


(16)" 

103 


217.51456. 


For the ratio of determination, then, we have 


m 

y 

^ S /yd V J 


1 

Nk 

N 




150.60065 - 2.48544 ^ 148.11521 
220.00000 - 2.48544 217.51456 


rj « .825, 

Comparison of p and rj. The estimate for the entire population is 

, ^ :68 P 9(10J - _1) , - _(1 2 _-1 ) . ^ 6423 

' 103 - 12 

ij = .801. 


Values of and p from these same data, using a second degree curve 
for the estimating equation, were .6471 and .804 respectively. Applying 
the same corrections, we find that 


■2 .6471 (103 - 1) - (3 

^ 103-3 

p = .800. 


1 ) 


.6400. 


It seems likely, therefore, that a second degree curve describes the true 
relationship between yield per acre of broom corn and man hours required 
for harvesting as accurately as does the line of means. 


These formulas are analogous to those used for ungrouped data and described on 
pages 354-355. It must be apparent tha^ if we take 






■ 2/p(dV)^ - s 




)■ 


1 L 

which is the imesplained variation in terms of intervals. 
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This impression is confirmed by an analysis of the variances obtained in 
computing p and rj. Recall that for these data: 


140.74305 

217.51456 

148.11521 

217.51456 


= .6471; andp = .804. 
= .6809; Bind 7] = .825. 


In the table below are summarized the apportionment of the variation, de- 
grees of freedom, and variance of this problem, each according to its source. 


Source of variation 

Amount of 
variation 

Degrees of 
freedom 

Variance 

Explained by second degree curve . 

140.74305 

2 

70.37152 

Increment due to use of means 

7.37216 

9 

.81913 

Total explained by use of means . 

148.11621 

11 

134.65019 

Unexplained variation from means (va- 
riation within columns) 

69.39935 

91 

.76263 

Total . . 

217.51456 

102 

2.13250 


The variation in row 2 (increment due to use of means) as weU as the 
variation in row 4 (unexplained variation from means) is most easily 
obtained by subtractions within the table, although each can be obtained 
independently if we so desire. In order to test the improvement in fit 
obtained by use of the line of means, we must relate the increment of 
explained variance due to the use of these means to the variance unex- 
plained by these means. Thus 


.81913 

.76263 


1.074. 


Values of F are not stated in Appendix G2 for ni = 9 and 712 = 91. How- 
ever, for the .05 level of significance, with ni = 8 and n 2 = 60, F = 2.097, 
which is greater than the computed F value. Apparently considerably 
more than five times in one hundred random samples we should expect im- 
provement as great 'as that obtained. Clearly the improvement is not 
significant. Furthermore, a second degree curve is to be preferred to the 
line of means as an empirical description of the relationship between broom 
corn yield and man hours required for harvesting, since the relationship is 
simpler and indicates continuous, rather than discrete, changes in man 
hours per ton required to harvest varying yields per acre. 

Limitations of correlation ratio. The reader may already have been 
struck with certain rather obvious limitations to the usefulness of the cor- 
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relation ratio. In the first place, the data must be grouped according to 
some classification of the independent variable. In our nitrogen and crop 
yield illustration this grouping was determined by the design of the ex- 
periment, while in the broom corn illustration the grouping was somewhat 
arbitraiy. In the second place, there is no estimating equation, but only 
the line of the means. Thus there is no hypothesis stated concerning the 
functional relationship between the variables, and no satisfactory way of 
making an estimate of the value of the dependent variable for any given 
value of the independent variable. Finally, the value of rj approaches 1 
as the number of columns is increased. This makes it especially important 
to estimate the value of rj for the population. The formula for so doing, 
it will be remembered, takes into account not only the size of the sample, 
but the number of X groups into which the sample is divided. 

Unreliability of CoeflS.cients of Curvilinear Correlation 

Approximate measures sometimes used to test the reliability of p and rj 
are 

= AxlpL. 

Vn - rri 

1 - 572 

VjV - m 

In these formulae, m refers to the number of degrees of freedom sacrificed; 
that is, ihe number of constants in the fitted curve for p, or the number 
of X classes in the case of r}. As in the case of r, the distribution of sample 
p^s is approximately normal around Pp only if the sample is large and pp 
is small. In addition to these restrictions the distribution of 7] is not 
normal unless the number of columns is indefinitely large. In fact, R. A. 
Fisher has pointed out that, for very large samples, Nrj^ tends to be dis- 
tributed as with degrees of freedom equal to the number of columns 
minus 1. 

A more rigorous test involves the analysis of variance. For p, we have 



In ease (say) a third degree curve has been used as an estimating equation, 
answers to any of these questions may be found: 

(1) Is the variance explained by a straight line significant? 

(2) Is the additional variance explained by a second degree curve 
significant? 

(3) Is the total variance explained by a second degree curve significant"^ 
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(4) Is the additional variance explained by a third degree curve sig- 
nificant? 

(5) Is the total variance explained by a third degree curve significant? 

In each case the unexplained variance (that considered as due to chance) 
is that remaining after use of the third degree curve. As usual, m is the 
degrees of freedom of the explained variance, and n 2 the degrees of free- 
dom of the unexplained variance. (If the explained variance 

is, of course, not significant.) 

Notice that the unexplained variance is that remaining after making 
use of the estimating equation containing all the constants which have 
been computed, rather than that remaining after using the estimating 
equation which contains constants of no higher order than those being 
tested. Thus, to answer question (3), the unexplained variation is that 
remaining after use of constant d rather than the unexplained variation 
remaining after use of constant c. The latter quantity is the variation 
due, not to chance factors alone, but to chance factors plus constant d* 
If the latter quantity had been used to obtain the unexplained, or chance^ 
variance, the F test might erroneously have failed to show significance for 
the constant being tested. It may seem to the reader that use of the 
variance remaining after including d in the estimating equation, when it 
is the significance of c that is being tested, tends to force a showing of 
significance. It is true that this procedure reduces the unexplained varia-^ 
tion^ but this fact is counteracted by the decrease in n 2 , the number of 
degrees of freedom remaining. This acts as an offsetting factor in two 
ways: (1) Since becomes smaller, the unexplained variance may actually 
become larger; (2) F must become larger for the same level of significance 
as 712 decreases. 

If an equation employing constant d has not been fitted, it is necessary 
to consider the variance unexplained by the second degree curve as the 
chance variance. As mentioned before, there is some loss in accuracy in 
so doing, provided, of course, that the reduction of the unexplained vari- 
ance by use of additional constants is not fortuitous. 

When rj has been computed, we may discover whether there is any 
significant correlation in the data by the use of 



If also first, second, and third degree curves have been fitted, each of the 
five questions above may be answered, as well as the additional question: 
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Is the additional variation explained by the line of means significant? 
To answer any of these questions, the explained variance (or additional 
explained variance) is compared with the variance remaining after use of 
the column means (that is, the variance within columns). To ask whether 
the additional variation explained by the line of means is significant is 
equivalent to asking whether the variation from the fitted curve has been 
significantly reduced, and therefore whether the fitted curve is a satisfac- 
tory hypothesis. 
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CHAPTER XXIV 

MULTIPLE AND PARTIAL CORRELATION 


Preliminary Eiqplanation 

Simple correlation. Before plunging into the theory of multiple and 
partial correlation it will be useful to review briefly the elementary prin- 
ciples of correlation, since the more refined measures involve simply au 
extension of the principles already discussed. First, an estimating equa- 
tion of the type Yc — a + hX was computed by the method of least 
squares. This permitted us to make estimates of the value of the de- 
pendent variable from values of the independent variable. Next, it was 
demonstrated that the total variation of the dependent variable was the 
sum of the explained variation, and the variation which we had failed to 
explain by our hypothesis; that is, that It should 

be remembered that we computed by the formula S 7^ — 7S7 ; 

and that Sy§ was computed by the expression = S7§ — 7S7, in 
which 2 7§ = a2r + bXXY when we were dealing with simple correlation. 

The standard error of estimate (Ty^, which is enabled us to judge 

the range of error of our estimates of the dependent variable. Since 

22/1 = ~ 22/g, 

- 27 " - XYl 

CTyg can be calculated by a process which involves subtracting the ex- 
plained variation from the total variation, or most easily from the ex- 
pression 


Finally, a measure was computed that permitted us to state the propor^ 
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tion of total variation which had been explained by variations in the 
computed values of the dependent variable Y c- This ratio 

, Syg SFg - FSy 
- 23/2 ~ SF2 - FS7 

was known as the coefficient of determination, and its square root was 
called the coefficient of correlation. 

Multiple correlation. Exactly the same principles are involved in muh 
tiple correlation as in simple correlation, but the procedure is more labori- 
ous since there is more than one independent variable. Also, it is neces- 
sary to use slightly different symbols. The illustration in this chapter 
will deal with the relationship between suicide rates by regions, and aver- 
age age, per cent male, and a business failure index in those same regions. 
Suicide rate is the dependent variable, and the other three are independent 
variables. 

To simplify computations so that they can be shown in full in this 
chapter, the United States has been divided into 18 regions of substantially 
equal population and more or less homogeneous characteristics. With the 
exception of New York state, which has been divided into New York City 
and up-state New York, the boundaries of these regions follow state 
boundaries. The composition of the different regions can be observed by 
reference to Chart 230, which has been so drawn that equal areas on the 
map indicate equality of population. Selection of homogeneous areas of 
equal population serves to make the statistical results more reliable in 
that each region given proper weight in the calculations, a consideration 
which statisticians frequently overlook in geographical correlations. On 
the other hand, use of only 18 observations with an equation of 4 constants 
does make the degrees of freedom dangerously small. The results ob- 
tained must therefore be regarded as primarily of illustrative importance. 

It simplifies the notations somewhat if each of the variables is desig- 
nated by the letter Z, differentiating between the variables by means of 
subscripts. We shall designate our variables in this manner; 


Dependent Variable: 

Suicide rate Xx 

Independent Variables: 

Average age Xz 

Per cent male Zg 

Business failure index X4 


The first step in the correlation procedure is to obtain an equation which 
includes all three of these independent variables as a means of estimating 
a suicide rate for any re^on. The estimate is labeled Zci. 234 , since it is 
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an estimate of variable Xi computed from variables X2, X3, and Z4. 
Since there are three independent variables, there will be three 6’s. The 
equation type will be 

Xci, 234 : == 234 + &12.34 X2 + &13.24 X3 + 614 23 X4. 

A word concerning the meaning of the b’s and their subscripts is necessary. 
These fist coefficients of estwiotioTi indicate the effect on Xi oi Bi change in 
the accompanying independent variable when allowance has been made 
for the other independent variables. Thus 612.34 is an estimate of the 
variation in suicide rate associated with a variation in average age, inde- 
pendent of variation in per cent male or business failures. The social 



Chart 230. Eighteen Regions of the United States of Substantially Equal Population 
and Homogeneous Characteristics. (On this map the area of each state is proportional 
to its population. Texas is not shaded, since it was not included in the death registra- 
tion area in 1930.) 

scientist is accustomed to saying ^^other things being equal. The other 
things which are held equal, i.e, at average value, are in this instance the 
proportion of males and the business failure rate in the different regions. 
As between regions that have the same per cent male and the same busi- 
ness failure rate, but differ with respect to age, each variation of one year 
in average age between regions will normally be accompanied by a varia- 
tion of 612.34 in suicide rate. The other 6 coefficients in the estimating 
equation are interpreted analogously, the figures to the right of the decimal 
point in the subscript indicating the factors that are held constant. Of 
course, really to know the effect on suicides of age alone, we should hold 
constant all other factors, not just per cent male and business failures. 
As we introduce more and more variables, this desirable situation is more 
and more closely approximated. The constant, ai.234 is the hypotheticaJ 
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value for suicide rate when the other factors considered have a value of 
zero. The estimate of, or normal value for, suicide rate of any region is 
the sum of the net amounts associated with each independent variable 
plus the value for a. 

We might observe at this point that the natural scientist can often 
design his experiment so as to control a number of the variables, such 
for instance as temperature, humidity, or air pressure. The biologist and 
the agricultural experimenter can control their variables to a considerable 
extent. On the other hand, economics and sociology, and most of the 
social sciences, are observational, rather than experimental, sciences. Since 
workers in these fields usually have only a very limited control over the 
material they must use, they must attempt to hold some of the variables 
constant statistically, rather than experimentally — by means of the mul- 
tiple correlation technique explained in this chapter. 

As in previous instances the total variation of the dependent series is 
the sum of the variation in the estimated values of that series and the 
variation of the actual values from the estimated values, that is = 
234 + Sx|i 234 . The procedure in computing measures of relation- 
ship is essentially the same as with simple correlation. The standard 
error of estimate is 


<T51.234 = 





and the coefficient of multiple correlation is 

T) <4/2x11.234 

Si .234 states the proportion of total variation that is present in the varia« 
tions of the computed, or Xci 234, values, and which has been explained 
by reference to the independent variables. R has no sign, since the asso- 
ciation may be positive with some, and negative with others, of the inde- 
pendent variables. It is interesting to note at this point that as addi- 
tional associated independent variables are brought into the problem, 
i2i 234 ‘ • •m approaches 1.0 and a si 234 • • -w approaches zero, enabling us to 
make estimates which are progressively more accurate. If we were able 
to include all pertinent factors, we could make a perfect estimate and 
Ri 234 . ^ would be 1 . 0 . 

Partial correlation. The coefiadent of partial correlation (for example, 
ri4.23) is a measure of the relationship between the dependent variable and 
one independent variable, when the influence of the other independent varr 
able (or variables) has theoretically been removed from both. The purpose 
of these coefficients is to show the relative importance of the different ihde- 
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pendent variables in explaining variations in the dependent variable. This 
is done by finding the extent to which correlation is increased by the addi- 
tion of another constant. More precisely, it may be said that the coefficient 
of partial determination (the square of the coefiicient of partial correlation) 
is the ratio between the increase in the variation of the computed values of 
the dependent variable resulting from introducing another independent variable 
{that IS, the net variation associated with that factor), and the variation that 
had not been explained before the introduction of the new factor. The de- 
nominator of this ratio may also be regarded as the total variation which 
the new variable seeks to explain. 

Turning now to our suicide illustration, a consideration of the two fac- 
tors, average age X 2 and per cent male X 3 , results in obtaining computed 
values for suicides, which we label Xci 23. The variation that has been 
explained is indicated by the symbol 23, and that which is still un- 
explained is 2.r|i 23 . If we now obtain an estimating equation by the 
use also of business failures as a third independent variable X 4 , the ex- 
plained variation becomes 2a:ri 234. The increase in the explained varia- 
tion is 2a:ci 2*^^. — 2xei 23, and the coefiicient of partial correlation is 


^14 23 = 


V 


234 


Sa:f 


■Cl 23 


2:^51 


23 


The subscript 14.23 indicates that the correlation is between suicide rate 
(Zi) and business failure rate (Z4) when average age (X2) and per cent 
male (X3) have been held constant. If we could pick ouc regions that 
are exactly alike with respect to age and per cent male, the simple corre- 
lation between suicide rate and business failure rate for those regions 
would tend to be the same as the above coefiicient of partial correlation. 
The sign of ru 23 is the same as that of 614.23 in the estimating equation 

Zci.234 = ai 234 + 612 34 X2 + 613 24 Z3 + 6l4.23 Z4. 


Computation Procedure 

Computation of product sums. Since this chapter will require a con- 
siderable number of measures of relationship between the four variables, 
it will be convenient to compute at one time all the values that are needed 
in the different formulae. The computations of the sums and product 
sums are shown in Table 174. 

There is one thing about this table which is worth special notice — the 
fact that there is an internal check on its accuracy. There are so many 
computations in multiple and partial correlation, each depending on the 
preceding, that it is foolhardy not to check each computation as it is done. 
To provide this check, a separate column is added to the first section of 



TABLE 174 


Computation of Pkoduct Sums Required for Measures of Relationship Between 
Suicide Rates and Average Age, Per Cent Male, and Business Failures, 

BY 18 Regions op the United States, 1930 



Suicide 

1 

Average i 

Per cent 

Business 

Check 

Region 

rate 

age 

male 

failure rate 

column 

Zi 

X2 

Xs 

X4 

Xs 

1 

13.45 

3163 

49 18 

136.2 

230.46 

2 

14 95 

30 22 1 

50 00 

160 4 

255.57 

3 

20.53 

29 95 

5012 

181 1 

281.70 

4 

16 55 

3198 

50 20 

112 9 

211 63 

5 

14 22 

29 31 

50.31 

79 5 

173.34 

6 ! 

17.51 

30.71 

50 57 

88.6 

187 39 

7 

17.99 

30.14 

51 47 

92 2 

191.80 

8 

18 16 

30 64 

50 76 

115 4 

214 96 

9 

17 64 

30.14 

51.38 

75 2 

174.36 

10 

18.61 

31.10 1 

50 46 

82 4 

182.57 

11 

15 93 

29 46 

51.54 

45 0 

141 93 

12 

12 32 

27 38 

49 89 

76 7 

166 29 

13 

9 99 i 

26 56 

49 47 

70 3 

156 32 

14 

10 40 

27 26 

50 53 

92 3 

180 49 

15 

7 69 

26 03 

49 84 

68 0 

151 56 

16 

10 51 

26.56 

51 35 

87.3 

175 72 

17 

22 20 

30 55 

53 05 

120 2 

226 00 

18 1 

26 65 

31.47 

51.83 

116 3 

226 25 ^ 

Total 

285 30 

531 09 

911 95 

1,800 0 

3,528 34 


Region 

Z1 

X 1 X 2 

X 1 X 3 

X 1 X 4 

Check column 
XiXs 

1 

180.9025 

425 4235 

661.4710 

1,831.890 

3,099.6870 

2 

223.5025 

451 7890 

747 5000 

2,397.980 

3,820.7715 

3 

421.4809 

614.8735 

1,028.9636 

3,717.983 

5,783.3010 

4 

273.9025 

529 2690 

830 8100 

1,868.495 

3,502.4765 

5 

202.2084 

416.7882 

1 715.4082 

1,130.490 

2,464 8948 

6 

306.6001 

537,7321 

885.4807 

1,551.386 

3,281.1989 

7 

323,6401 

542.2186 

925.9453 

1,658.678 

3,450.4820 

8 

329.7856 

556.4224 

921 8016 

i 2,095.664 

3,903,6736 

9 

311,1696 

531.6696 

906.3432 

1,326.528 

3,075.7104 

10 

346 3321 

578.7710 

939.0606 

1,533.464 

3,397.6277 

11 

253.7649 

469.2978 

821.0322 

716.850 

2,260 9449 

12 

151.7824 

337.3216 

614.6448 

944.944 

2,048.6928 

13 

99.8001 

265.3344 ^ 

494.2053 

702.297 

1,561.6368 

14 

108.1600 

283.5040 

525.5120 

959.920 

1,877.0960 

15 

591361 

200.1707 

383 2696 

522.920 

1,165.4964 

16 

110 4601 

279.1456 

539.6885 

917.523 

1,846.8172 

17 

492.8400 

678.2100 

1,177.7100 

2,668.440 

5,017.2000 

18 

710.2225 

838,6755 

1,381.2695 

3,099.395 

6,029.5625 

Total 

4,905.6904 

8,536.6165 



57,587 2700 


744 




TABLE 174 (Continued) 

Computation op Product Sums Required for Measures op Relationship Between 
S xnciDE Rates and Average Age, Per Cent Male, and Business Failures, 

BY 18 Regions of the United States, 1930 


Region 

XI 

X 2 X 3 

X 2 X 4 

Check column 
i X 2 XX 

1 

1,000.4569 

1 1,555 5634 

4,308 006 

7,289.4498 

2 

913 2484 

i 1,511 0000 

4,847.288 

7,723.3254 

3 

1 897 0025 

1,501 0940 

5,423.945 

8,436.9150 

4 

1,022 7204 

1,605 3960 

3,610.542 

1 6,767.9274 

5 

859 0761 

1,474.5861 

2,330 145 

5,080.5954 

6 

943.1041 

1,553 0047 

2,720.906 

5,754.7469 

7 

908 4196 

1,551 3058 

2,778 908 

5,780 8520 

8 

938 8096 

1,555 2864 

I 3,535 856 

6,586 3744 

9 

908 4106 

1,548 5932 

2,266 528 

5,255.2104 

10 

967 2100 

1,569 3060 

2,562 640 

5,677.9270 

11 

867 8916 1 

1,518 3684 

1,325 700 

4,181 2578 

12 

749 6644 

1,365 9882 

2,100.046 

4,553 0202 

13 

705 4336 

1,313 9232 

1,867.168 

4,151 8592 

14 

743 1076 

1,377 4478 

2,516 098 

4,920.1574 

15 

677 5609 

1,297 3352 

1,770 040 

3,945.1068 

16 i 

705 4336 

1,363 8560 i 

2,318.688 

4,667 1232 

17 

933 3025 

1,620 6775 

3,672.110 

6,904.3000 

IS 

990 3609 

1.631 0901 

3,659 961 

7,120 0875 

Total 

15,731 2223 

26,913.8220 

53,614.575 

104,796.2358 


Region 

xi 

X 3 X 4 

Check column 

1 

2,418 6724 

6,698.316 

11,334.0228 

2 

2,500 0000 

8,020 000 

12,778.6000 

3 

2,512 0144 

9,076.732 

14,118.8040 

4 

2,520 0400 

i 5,667.580 

10,623.8260 

5 

2,531.0961 

1 3,999 645 

8,720.7354 

6 

2,557.3249 

4,480 502 

9,476.3123 

7 

i 2,649.1609 

4,745.534 

9,871 9460 

8 

I 2,576 5776 

5,857 704 

10,911.3696 

9 

2,639 9044 

3,863.776 

8,958.6168 

10 

2,546.2116 

4,157 904 

9,212.4822 

11 

2,656.3716 

2,319.300 

7,315.0722 

12 

2,489 0121 

3,826.563 

8,296.2081 

13 

2,447.2809 

3,477 741 

7.733.1504 

14 

2,553.2809 

4,663.919 

9,120.1597 

15 

2,484 0256 

3,389.120 

7,553.7504 

16 

2,636.8225 i 

4,482.855 

9,023.2220 

17 

2,814.3025 

6,376.610 

11,989.3000 

18 1 

2,686.3489 

6,027.829 

11,726 5375 

Total 

46,218 4473 

91,131.630 

178,764,0154 
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TABLE 174 (Continued) 

Computation of Product Sums Required for JMeasures of Relationship Between 
Suicide Rates and Average Age, Per Cent Male, and Business Failures, 

3Y 18 Regions of the United States, 1930 


Region 

Xl 

Check column 
X 4 X 2 ; 

1 

18,550 44 

31,338 652 

2 

25,728 16 

40,993.428 

3 

32,797 21 

51,015 870 

4 

12,746 41 

23,893 027 

5 

6,320 25 

13,780 530 

6 

7,849 96 

16,602 754 

7 

8,500 84 

17,683 960 

8 

13,317 16 

24,806 384 

9 

5,655.04 

13,111 872 

10 

6,789 76 

15,043 768 

11 

2,025 00 

6,386 850 

12 

5,882 89 

12,754.443 

13 

4,942 09 

10,989.296 

14 

8,519 29 

16,659.227 

15 

4,624 00 

10,306 080 

16 

7,621 29 

15,340 356 

17 

14,448 04 

27,165.200 

18 

13,525.69 

26,312,875 

Total 

199,843 52 

374,234.572 


Using the formulae giver, on page 747, the computations are checked as follows: 

SXs = 285.30 + 531 09 + 911.95 -f 1,800.0 
= 3,528 34 

SZiXsf - 4,905 6904 + 8,536 6165 + 14.500.1161 + 29,644.847 
= 57,587.2700. 

SXaXs =- 8,536.6165 + 15,731.2223 + 26,913.8220 + 53,614.575 
- 104,796.2358 

ZXsXs = 14,500 1161 + 26,913 8220 + 46,218.4473 + 91,131.630 
= 178,764.0154 

SXiXs = 29,644 847 + 53,614 575 -f 91,131.630 + 199,843.52 
374,234.572 

Source: Computed from data found m publications listed below: 

Average age United States Department of Commerce, Bureau of the Census, Fifteenth Census of *he 
Umt^ States, 1930, Volume II 

Per cent male Umted States Department of Commerce, Bureau of the Census, Ahstract of the Fifteenth 
Census of the United States, 1930 

Business failure rates' United States Department of Commeree, Bureau of Foreign and Domestic Com- 
merce, Statistical Abstract of the Umted States, 1931, and Dun and Bradstreet, Inc 
Suicide rates United States Department of Commerce, Bureau of tbe Census, Mortality Statistics, 1930 
and Abstract of the Fifteenth Cemsus of the United States, 1930, 
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this table, labeled Zs. Each item in this column, including its total, is 
the sum of the other items in the same row. If, therefore, the sum of the 
items in the Zs column equals the sums of the totals of columns Zi, Z2, 
Z3, Z4, the multiplications and additions are assumed to be correct. The 
right-hand column of each other section is also a check column. The 
checks are provided in each section by verifying the following identities; 

2Zi + SZ2 + 2Z3 + SZ4 = SZs. 

SZf + SZ1Z2 + SZ1Z3 + 2Z1Z4 = SZiZs. 

SZ1Z2 + SZi + 2Z2Z3 + 2Z2Z4 = SZ2Z2. 

SZ1Z3 + SZ2Z3 + SZi + SZ3Z4 = SZsZs. 

SZ1Z4 + SZ2Z4 + 2Z3Z4 + SZ| = SZ4ZS. 

By convertmg all of the product sums of Table 174 into deviations 
from the different means, the labor of computation will be materially 
lightened. This is because any straight line fitted by the method of least 
squares always passes through the means of the series and therefore a in 
the estimating equation becomes zero; and since there is one less constant 
to find, there is one less normal equation. To put the matter concretely, 
in our present problem, we may find directly the estimating equation 

Zci 234 = CLl 234 + ^12 34 Z2 + 6l3 24 Z3 + 5l4.23 Z4, 

which requires simultaneous solution of the four normal equations: 

SZi = Ncti 234 4" &12 342Z2 4“ &13 242Z3 4“ &14 23SZ4* 

SZ1Z2 = CLl 234SZ2 4 " &12 34SZ2 4 * 613 242Z2Z3 4 “ f >14 232Z2Z4. 

SZ1Z3 = 234SZ3 + 512.34SZ2Z3 4- 6l3.24SZi + ?>14.23SZ3Z4. 

SZ1Z4 = ai 234SZ4 4 - 612 342Z2Z4 + 613.242Z3Z4 4 - &14 23SZI 

A more expeditious procedure consists in using the estimating equation in 
terms of deviations. Thus 

;rc' 1.234 = hi2 34 :X 2 4 “ biz.2^Xz + &i 4 . 23 aJ 4 . 

Values for this equation are obtained by use of the normal equations in 
.ri, X2j xzj and X4 instead of Zi, Z2, Z3, and Z4. Since Xx = 0, the first 
normal equation disappears, the others becoming: 

XxiXz = 6 i 2 . 342 x | 4 “ blZ.2i^X2X3 + bu,2Z^X2X4- 

IjXixz = 612 342a;2;r3 4" 6i3.242a:| 4“ bi4:.2Z^xsX4^ 

1jXiX4 = 612 Z^X2X4 4 - biz,2^XzX4 + bl 4 . 2 Z^X%. 

To convert the product sums of Table 174 into deviation product sums 
we must subtract a correction factor from each:^ 

^ The derivation of these equations is fairly obvious. The first and last will be 
taken as illustrations. 

2^;! = 2(Zi - 

-2(Z! - 2 Z 1 Z 1 +Z!) 

« 2Z! ~2Z,2Zi -hZZ? 
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Sa:? = SZf - XiSXi. 

Sa:| = SZ| - X 2 SZ 2 ; 

Sa:2Xo = SZ2Z3 - Z2SZ3, or SZ2X3 - Z3SZ2; 

2x2X4 = SZ2X4 - Z2SZ4, or SZ2Z4 - Z4SZ2; 

SxiX2 = SZ1Z2 - Z22Z1, or SZ1Z2 - Z1SZ2; 

Sx| = SZf - Z 32 Z 3 ; 

2 x 3 X 4 = 2 Z 3 Z 4 - Z 32 Z 4 , or 2 Z 3 Z 4 - Z 42 Z 3 ; 

2x1X3 = 2Z1Z3 - Z32Z1, or 2Z1Z3 - Z12Z3; 

2x| = 2Z| - Z 42 Z 4 ; 

2x1X4 = 2 Z 1 Z 4 - Z 42 Z 1 , or 2 Z 1 Z 4 - Z 12 Z 4 ; 

These computations are made in Table 175, which has an internal check 
similar to that of Table 174. For instance, to verify 2x| and 2xiX4, we 
have: 

2xiX 4 + 2x2X4 + 2X3X4 + 2x1 = 2X4X2; 

505.575 - 63.370 + 19,843.52 + 1,114.847 = 21,400.572. 

The diagram given on page 750 shows the method of making each check. 
In the diagram the dotted arrows indicate the product sums to be added in 
order to obtain the totals recorded in the X 2 column. 

Computation of gross measures of relationship. Simple correlation is 
in reality gross correlation, since it measures the relationship between 
two variables, without any adjustment by correlation technique for the 
effects of other variables. Using the symbols developed in the introduc- 
tory section, we should compute the following measures if we wish to 
correlate suicide rates Zi with average age Z 2 alone : 

Normal equations: 

I. 2Zi = iVflti 2 d" 5 i 22Z2. 

II. 2 Z 1 Z 2 = cti 22 Z 2 4- 5i 22Z|, or 2xiX2 = 6t22x|. 

Estimating equation: 

Xci 2 = 2 + 6 i 2 Z 2 , or xei 2 = 612 X 2 . 


Sum of squares of computed values: 

2Z§i 2 = ai.22Zi + 6122Z1Z2, or 2x§i.2 = 6122x1X2 

(Sum of explained squares) (Explained variation) 

= SXf - 2 Zi 2 Xi + ZiSXi 
= SX! - Zi2Xi 
2 X 1 X 4 = 2 [(X 4 - XiKXi - Zi)] 

= 2 (XiX 4 - XiXi - X1X4 + X1X4) 

= 2X1X4 - Z42X1 - Xi 2X4 -f. X1XZ4 


- 2 X 1 X 4 - X42Xi 


2X12X4 2 X, 2 X 4 

N ^ N 


» 2 X 1 X 4 - Z 42 X 1 . 



TABLE 175 

Computation of Deviation Product Sums Required for Measures of Relationship Between Suicide Rates and Age, Per Cent 

Male, and Business Failures, by 18 Regions of the United States, 1930 

Sums and Means 



Source: Table 174. 
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Sum of squares of deviations from estimates {unexplained variation) : 
Sxli 2 = SZf - SZ§i 2 , or 2xf - Sxgi 2 


Standard error of estimate 


JMit 

1 N 


O’ SI 2 


=v 


sz? - 


Cl 2 


N 


or 


yM 


— ^Xcl.2 


N 


Coefficient of correlation: 


ri2 = 


J liXci 2 
* 2Zf- 


XiSXi 


or 


I XXci 2 
Sxf • 


Xi2Xi ‘ 

The careful reader will already have noticed that we are merely setting 
down the different equations and formulae used in simple correlation, 
mth slightly different symbols. The coefficient of correlation ru is some- 
times called a zero order coefficient, since there are no additional inde- 
pendent variables held constant statistically. 

Results of computations based on these expressions are given below; 
on the left, the data are taken in their original form, while on the right, 
deviations from means are used. All values are found in or derived from 
Table 175. 


Normal equations: 

L 285.30 = 18ui 2 + 531.096i2. 

II 8,536.6165 = 531.09ax,2 + 15,731.22236i2, II. 118,840 = 61.41196i2. 

Estimating equation: 

Xci 2 = -41.246051 + 1.9351314X2. Xci 2 = 1.93513:^2. 

The equation xci .2 = 1.93513:^2 may be converted into Xci 2 = —41.246 
+ 1 . 9 ^ 13 X 2 by ascertaining the value of ai ,2 from the expression ai 2 == 
Xi — X 2612 . Thus 

ai.2 = 15.85 - (29.505) (1.93513) - -41.246. 

Sums of explained squares: Explained variation: 

2Zli 2 = (-41.246051)(285.30) 2 = (1.93513) (118.840) 

+ (1.9351314)(8, 536.6166) 

= 4,751.976. = 229.971. 

Sum of squares of deviations from estimates: 

S4i 2 = 4,905.6904 - 4,751.976 Sa:|i .2 = 383.6854 - 229.971 

= 153.7144. = 153.7144. 

Standard error of estimate: 

_2 4,905.690 - 4,751.976 2 383.685 - 229.971 

5-si 2 = jg ^ ffsl .2 = jg 

= 8.540 = ^540. 

<Tai 2 = 2.922 suicides per 100,000. (rsi 2 = 2.922 suicides per 100,000. 
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'12 


ri2 


Coefficient of correlation: 


4,751.976 ~ (15.850) (285.30) ^ _ 229.971 

4,905.690 - (15.850) (285.30)' 383.685' 


4,751.976 - 4,522.005 _ 
4,905.690 - 4,522.005 " 

+.7742. 


= .5994. 


ri2 = +.7742. 


Chart 231 shows scatter diagrams of the simple relationship between 
suicide rates and each of the independent variables being considered. The 
standard errors of estimate and coefficients of simple correlation for these 
relationships are: 

Suicide rate and average age: 

0 * 51.2 = 2.922 suicides per 100,000; ri 2 = +.7742. 

Suicide rate and per cent male: 

cTsi 3 == 3.719 suicides per 100,000; ru = +.5925. 

Suicide rate and business failure index: 

CTsi 4 = 4.124 suicides per 100,000; ru ~ +.4040, 

The evidence from these coefficients of correlation indicates that age is 
a fairly important factor bearing on suicide, and that per cent male and 
business affairs are of lesser importance in the order named. Age and 
per cent male are not necessarily ultimate causes of suicide, but the ulti- 
mate causes, whatever they are, seem to have a heavier incidence on meti 
than on women, and on the old than on the young. On the other hand, 
recent studies have pointed to the conclusion that more women attempt 
suicide than men, but that men are more successful in killing themselves. 
Perhaps business failure may be thought of as a more fundamental cause 
of suicide. At any rate, economic factors are most commonly blamed by 
men who attempt suicide, while domestic difficulties are most commonly 
blamed by women. 

Further information will be yielded by a careful study of Chart 232, 
Section A of this chart indicates the deviations of suicide rates from their 
mean, while section B shows the deviations in the estimates of suicide rate 
from their mean, that is, the individual explained variations. With sev- 
eral notable exceptions, the bars in this section appear about the same as 
in section A. Finally, section C indicates the individual variations that 
have not yet been accounted for; that is, the deviations of the actual sui- 
cide rates from the estimated rates. These deviations are obtained for 
each region by subtracting (algebraically) the value of the estimate from 
the actual value. Inspection of the chart will permit the reader roughly 
to verify the magnitude of the bars in section C. Since tb#"se unexplained 




ms of Suicide Rates from Their Mean (jk 
A verage Age as the Independent Variab] 
le 174.) 





Chap. 24] MULTIPLE AND PAETIAL CORRELATION 


755 


variations are obtained by a subtraction process, they are often called 
residuals. The reader is already aware that, if the distances represented 
by^each bar in this chart be squared, the sum. of the squared values corre- 
sponding to sections B and C would equal those of section A. 

In general, the bars in section C are much smaller than those in section A, 
but there are some exceptions. In the cases, for instance, of upper New 
England, North Atlantic, and up-state New York, it would have been 
more accurate to have guessed the suicide rates to have been 15.85, the 
simple average for the United States, than to have used the estimating 
equation. Confining ourselves now to the poorest estimates, we see from 
section C that we have yet to explain why the suicide rate was so low in 


Z %5I 2 



A. PER CENT MALE X, AND SUICIDE RATC S. BUSINESS FAILURE INDEX X4 AND SUICIDE RATE 

ADJUSTED FOR AVERAGE AGE X , , , ADJUSTED FOR AVERAGE AGE X si * 


Chart 233. Scatter Diagrams of Per Cent Male (Zs) and Business Failure Index 
(Z4), Compared with Suicide Rate Adjusted for Average Age (Xsi 2). (Derived from 
data of Table 174.) 

upper New England, North Atlantic, and up-state New York: and why 
so high in New York City, the Northwest states, and California. 

Some clue to this difficulty is afforded by reference to Chart 233. In 
each section of this chart the dependent variable is the iadividual unex- 
plained variations in suicide rate {xsi 2 = ^1 — ^pi. 2 ) which were shown 
in section C of Chart 232. From section A of Chart 233 it is seen that 
regions 1, 2, and 4, which show large negative residuals in suicide rate, 
are low in per cent male also, while regions 17 and 18 are high, both with 
respect to positive residuals and per cent male. On the other hand, the 
per cent of males in New York City (region 3) appears to be below average. 
From section B of this chart we find that the business failure index number 
for New York City is exceptionally high, though business failures do not 
seem to explain very well the residuals in regions 1, 2, and 4. I¥om an 
examination of the two sections of Chart 233 it is evident that we can 
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reduce the errors of our estimates and improve our correlation, more by 
including per cent male as a second factor than by including business 
failures. Consequently the next part of this chapter will correlate suicide 
rates with average age and per cent male simultaneously. 

Before lea^;ing this section, it is well to record values which were com- 
puted in connection with the simple correlation coefficients and which wiU 
be needed in subsequent sections. 

Sum of squares: 

These are not needed if all computations are made originally from devia- 
tions from means. 

Total amount: 

2Zi - 4,905.690. 

Amount explained by gross estimating equations: 

2Z|i.2 = 4,751.976. ' 

SZli 3 = 4,656.676, 

4 = 4,584.639. 

Sums of squared deviations {measures of variation): 

These are obtained, by appropriate formulae, directly from the data in 
deviation form, or they may be obtained by subtracting the correction 
factor XiSXi = 4,522.005 from each of the above expressions. 

Total variation: 

Sa;f = 383,685. 

Variation explained by gross estimating equations: 

Sa:^i.2 = 229.971. 

Sa:ci.3 = 134.671. 

2zci 4 = 62.634. 

Two independent variables: multiple correlation. Naturally we can 
expect 10 estimate suicide rates more accurately if we take two independent 
variables into consideration, rather than only one. Hence let us make 
estimates from both average age and per cent male. The estimating 
equation type is 

Xci.23 = 23 + 5i2,3 ^2 + 5l3 2 Z3, 

or, in tenns of deviations, 

XC 1 . 2 Z = 5i2.3 X 2 + ?>13*2 X 3 . 

The 1,23 subscripts after X and a tell us that we are estimating values 
of Xi (suicide rates) from variables X 2 (average age) and Xa (per cent 
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male) The first h tells the normal change in suicide rates associated 
with a unit change in aver^ige age for regions that have the same per cent 
male composition; the second b tells us the normal change in suicide rates 
associated mth a unit change in per cent male for regions of the same 
average age. 

The normal equations required are: 

I. 2-X^l = NcLi 23 Hh bi 2 3 SX 2 + 613 2 SX3; 

II. "2X1X2 = 232^2 bi 2 sSAI “I” 613 22 X 2 Xzf 
III. SZ1A3 = ai 23SZ3 + hi2 3SX2Z3 + 5 i 3 22 X 1 . 

Making the required substitutions, we have: 

I. 285.30 = 18ai 23 + 531.096i2 3 + 911.955i3 2; 

II. 8,536.6165 = 531.09ai 23 + 15,731.22236i2 3 + 26,913.8220&i3.2; ‘ 

III. 14,500,1161 = 911.95ai 23 + 26,913.82205i2 3 + 46,218.44735i3 2 . 

Solving these three equations gives 

Zci 23 = -146,12082 + 1.6925398X2 + 2.2112877X3 

Some labor may be saved if the normal equations are put in terms of 
deviations from the means. In this case the first equation disappears, 
since Sxi, 2x2, and 2xz are each zero. The equations are: 

II. 2xiX2 = 612 32x1 + biz 22x2x3: 

III. SxiX3^= 612 32x22:3 + 613 22 x§. 

Solving these equations simultaneously: 

II. 118.84 - 61.41195i2.3 + 6.73725i3.2; 

III. 45.7086 = 6.7372612 3 + 15,51386i3.2. 

We have 

xci.23 - 1.692539x2 + 2.211297x3. 

These b values agree closely with those obtained before. From the latter 
estimating equation ai.23 is found by the expression^ 

ai 23 = -^1 “ 612 3X2 — 613.2X3 

= 15.85 - (1.692539)(29.505) - (2.211297) (50.6639) 

= -146.121. 

The value for the explained sum of squares is obtained by an expression 
analogous to that derived in Appendix B, section XXII-l, equation 3: 

SXci .23 “ Ctl. 232 Xl d" 612 321X1X2 “t“ 613 22X1X3 

= (-146.12082) (285.30) + (1.6925398) (8,536.6165) 

+ (2.2112877X14,500.1161] 

= 4,824.222, 


2 See Appendix B, section XXIV — 1. 
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The explained variation may be obtained by subtracting XiSXi = 
4,522.005 from the above value; or, if the deviation form is preferred, 
the variation is cotnputcd directly: 

2 a;ci 23 = ii2 3 SX 1 X 2 + 6i3.2Sa:iX'3 

= (1.692539)(118.840) + (2.211297)(45.7086) 

= 302.217. 


The measures of relationship are now computed in a fashion precisely 
similar to that employed when there was only one independent variable. 


ffsl.23 = 


SXf - SXSi. 


23 


N 


4,905.690 - 4,824.222 81.468 

18 18 


= 4.526. 


O', SI 23 — 2.127. 

SXli 23 - XiSZi 4,824.222 - 4,522.005 _ 302.217 
Ai 23 - liSXi 4,906.690 - 4,522.005 383.685 


El 23 = .8875. 

When the data are in deviation form, 

2 Sa:? - 2a;|i 23 383.685 - 302.217 81.468 

23 ^ Jg Jg 


4.526. 


.7877. 


CTsi 23 


2.127. 

Sxli 23 _ 302.217 _ , 
Xxl 383.685 ' 


El 23 = .8875. 

This coefficient of multiple determination (Ef 23) is the proportion of total 
variation that is present in the computed, or Xni. 23 , values, and which 
has therefore been explained by reference to variables X 2 and X 3 ; the 
coefficient of multiple correlation (i 2 i. 23 ) is the square root of the propor- 
tion of variation in suicide rates between regions explained by reference 
to the values of average age and per cent male in the various regions. 

In similar fashion we obtain the corresponding measures of relationship 
from other combinations of two of the independent variables. The three 
possible combinations are as indicated below. 


Suicide rate with average age and per cent male: 
Xai.23 = -146.12082 + 1.6926398X2 + 2.2112877X3; 
xci 23 = 1.69254a:2 + 2.21129a:3. 

SXl/1.23 = 4,824.222; 

Sa:ci.23 = 302.217. 
o‘li.23 = 4.526; 
o';5i.23 “ 2.127. 

E!.23 = .7877; 

Ei.23 = .8876. 
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Suicide rate with average age arid business failure index: 

Xci 24 = -40.00222 + 1.863474X2 + .00870415X4; 

a:ci.24 = 1.863472:2 + .00870415a;4. 

SXci 24 = 4,753.134; 

S4i 24 = 231.129. 

<rli 24 = 8.474; 

0's\24: = 2.911. 

Rf.24 = .6025; 

Hi 24 == .7762. 

Suicide rate with per cent male and business failure index: 

Xci34 = -153.82095 + 3.2177785X3 + . 06645785 A''4; 

^<71.34 — 3.21778^:3 + .0664578a:4. 

SXci 34 = 4,743.184: 

^xci 34 = 221.179. 

0*11,34 = 9.028; 

(Tsi 34 = 3.005. 

R\ 34 ^ .5765; 

Ri 34 = .7593. 

It is to be noted that the two best combinations include the factor of 
average age; the two poorest, business failure index. This would suggest 
that age is the most important of the three factors having to do with 
suicide rates, and business failures the least important. Although this is 
the same rank in importance that was found when coefficients of simple 
(gross) correlation were used, such is not necessarily the case. 

A visual impression of our progress is afforded by Chart 234, which 
shows: deviations of suicide rates from their mean (xi); deviations from 
their mean of computed suicide rates, based upon the estimating equation 
using average age and per cent male as independent variables ( 0 : 01 . 23 ); 
and deviations of suicide rates from computed rates ( 0 : 31 . 23 ). The bars 
representing xci 23 , which are in section B, are the individual explained 
variations, while those representing 0 : 51 . 23 , which are in section C, are the 
individual unexplained variations. First it should be observed that the 
bars in section B of Chart 234 are somewhat longer than the corresponding 
bars in section B of Chart 232, and that they parallel more closely those 
of section A. In mathematical language, the explained variation has in- 
creased from So:ci .2 = 229.971 and ^Xci,z = 134.671 to Sa;ci ,23 — 302.217. 
Because this is true, the correlation increased from ri 2 — +.7742 and 
Tiz " +.5925 to i2i.23 = .8875. Likewise, of course, the unexplained va- 
riations represented by the bars in section C have been reduced somewhat. 
Correspondingly, the standard error of estimate has declined from 
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= 2.912 and asi z = 3.719 to (Xsi 23 = 2.127. It is always the case that) 
as more pertinent variables are introduced, the standard error of estimate 
becomes smaller, and the coefficient of multiple correlation larger. This is 
true even if some of the independent variables are negatively associated 
with the dependent variable. It is nevertheless true that suicide rates are 
considerably below our estimates for upper New England and up-state 
New York, and far above our 
estimates for New York City and 
California. In fact our estimate 
for New York City is worse than 
before. If we consult Chart 235, 
however, we shall see that the 
high suicide rates in these latter 
two regions (3 and 18) may par- 
tially be explained by business 
failures in those regions; but on 
the other hand, in upper New 
England (1) and up-state New 
York (4), where the lowness of 
the suicide rates is not accounted 
for, the business failure index is also higher than the United States aver- 
age. It remains to be seen whether business failures, as such, are an im- 
portant explanation of suicide. As judged by Chart 235, the relationship 
does not seem to be veiy" close. 

Two independent variables: partial correlation. When only one inde- 
pendent variable (age) was considered, the deviations in our estimates 
were as shown in section B of Chart 232. By including an additional 
variable (per cent male) these explained deviations were increased to the 
amounts shown in Chart 234. In terms of symbols: 

Variation explained by age and 

per cent male Srcci 23 == SZ?1,23 - lax. 

Variation explained by age 

^ alone 2a;|i 2 = 2 Zc \.2 - ZiSZi 

Increase in variation explained — — — 

by per cent male Sa:ci.23 - Sxci .2 = SZci 23 - 2Zei.2 

After taking age alone into consideration, the deviations remaining to be 
explained were those shown in Chart 232C. To summarize in terms of 
symbols: 


Xai 23 



Chart 23S. Scatter Diagram of Business 
Faitee Index {X^) and Suicide Rate Ad- 
justed for Average Age and Per Cent Male 
(JC 51 . 23 ). (Derived from data of Table 174.) 


Sxli .2 = SZ! - SZli. 2 , or M - 
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The proportion of the variation previously unexplained, then, which wa‘3 
explained by including per cent male also, is the ratio 

^Xci,2Z 2 ^ 

. sx!~-sxIi .2 ' 

or, if the deviation method is used, 

Sail .23 ~ 2 x § i .2 

This ratio is known as the coefficient of partial determination, the square 
root of which is the coefficient of partial correlation. Using the values 
already computed, therefore, we find: 

2 2X^1 23 - 2Zji 2 4,824.222 - 4,751.976 _ 72 246 _ 

" SXf - SXli 2 " 4,905.690 - 4,751.976 153.714 ' ’ 

2x|i. 23 - 2a:!i 2 _ 302.217 - 229.971 _ 72.246 _ 

Sx! - '2xli.2 383.685 - 229.971 153.714 ' 

ri 3 2 “ -h.6856. 

The sign of this coefficient of partial correlation is taken from the sign of 
613 2 in the estimating equation. This coefficient is a measure of the 
closeness of relationship between suicide rate and per cent male when age 
has been held constant statistically; it is the simple correlation coefficient 
which would be expected for regions of the same average age. 

As a companion measure to ri 3 . 2 , we should obtain the partial coefficient 
ri 2 . 3 , which measures the relationship between suicide rate and age when 
per cent male has been held constant. This is done by finding the in- 
crease in the variation of the computed values by using age and per cent 
male in our estimating equation rather than using per cent male alone. 
Thus: 


_ 2X!i 23 - 3 ^ 4,824.222 - 4,656.676 ^ 167.546 _ 

SZ! - SZ^i .3 4,905.690 - 4,656.676 249.014 ' 


_ 24 i .23 - 24i 3 _ 302.216 - 134.671 _ 167.545 _ 
24 - 24i 3 383.685 - 134.671 249.014 '' 

ra 2 3 = +.8202. 


The gross, or simple correlation between suicides and age, it will be re- 
called, was +.774. Removing the effect of variations in per cent male 
from both variables has increased the relationship materially — to +.820. 
Perhaps, however, the reader will be surprised to find a coefficient of 
multiple correlation of only .888, and coefficients of partial correlation 
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of +.686 and +.820. It is not a characteristic of these types of measures 
that the multiple coefficient is the sum of the two partial coefficients. The 
relationship is more complex than that.^ It may be said, however, that 
for given values of ri 2 and ri 3 having the same sign the less the duplica- 
tion in the independent variables (and so, the lower their positive, or the 
higher their negative, correlation ; r 23 in this case), the higher will be the 
multiple correlation.^ In the present instance T 2 z = +.218, and hence 
the addition of either age or per cent male materially improves the esti- 
mate over that obtained from either alone. To ^id the reader in seeing 
the interrelationships among the independent variables, scatter diagrams 
are shown in Chart 236, together with the correlation coefficients r^Zy r24) 
and r34. 

Other multiple and partial coefficients are : 


R 


2 

1 24 


SXoi.24 - XiSX i 

SZf -IiUXi' 
24 231.159 

S*? ' “ 383.685 


Ri 24 = .7762. 


4,753.164 - 4,522.005 
4,905.690 - 4,522.005 

.6024. 


231 159 
383.685 


.6024, or 


ri4 2 ~ 


ri4 2 =* 


SXli 24 - 2 4,753.164 - 4,751.976 _ 1.188 

SX? - SA'Ii 2 “ 4,905.690 - 4,751.976 153 714 

24 - Sail 2 _ 231.159 --• 229.971 _ 1 188 _ 
2xt - Sill .3 ■" 383 685 - 229.971 “ 1,53.714 “ 
+.0879. 


2 _ ^^01 Z4 - SXli 4 _ 4,753.164 ^ 4,584 63 9 _ 168.525 _ 

~ 2X1 - 2Xli 1 “ 4,905.690 - 4,584.639 “ 321 051 “ 

S4i 24 - Sxli 4 231.159 - 62.634 168 525 

- Sxli,4 “ 383.685 - 62 634 ” 321.051 “ 
ri2 4 = +.7245. 


® The relationship is as follows; 

ri2 + ri3 — 2ri2risr2s 
= 2 ' 
1 — r23 


In this case 


Ri.2S 


.5994 + .3510 - 2(.7742) (.5925) (.2181) 
1 - .0476 


.7878. 


Ri.2z = .8876. 


^ However, if ri 2 and riz have different signs, then, the lower the negative or the 
higher the positive correlation of r 28 , the higher the value of Ri 23 . The reader can 
verify these statements by assuming various values for r 12 , r 13 , and r 23 and using the 
expression for jBi 23 given in footnote 3. Values of ru inconsistent with the given 
values of and rit must not be used. 




of the United States, 1930. (Data of Table 174.) 




The results of computing these partial coefficients lead to the same con- 
clusions as do the multiple coefficients. Looking at the partial r’s, age is 
seen to be more closely related to suicides than is per cent male ; per cent 
male more closely than business failures; and as might be supposed, age 
more closely than business failures. 

It remains now to be seen whether the conclusions concerning the rela- 
tive importance of our three independent variables will remain tenable 
when all four variables are considered simultaneously, rather than as dif- 
ferent combinations of three variables. This problem vdll be considered 
in the following section. For the sake of simplicity, and since the reader 
should be sufficiently experienced with the longer procedure by now, data 
throughout the discussion will be used only in the form of deviations 
from means. 

Three mdepeudent variables; multiple correlation. It is perhaps un^ 
necessarily repetitive to go through with the same process again with one 
more variable added. The procedure is similar regardless of the number 
of variables. How^ever, we have not yet definitely discovered how closely 
we can predict from all three independent variables, age, per cent male, 
and business failures; nor have we determined the relative importance of 
these factors. 

The estimating equation and three normal equations required are as 
follows. 

Estimating equation: 

XciSZ4> = + & 13 . 24^3 + 
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Normal equations: 

11 . II 1 X 1 X 2 — h \2 + biz 2i2x2X3 + hu 2s'^X2X4; 

III Xxixz " 612 342 x 2 x 3 + hiz 242x| + hi4 232 x 3 x 4 ; 

IV SxiX4 = bi2 3i2x2X4 + biZ 242 X 3 X 4 + &14 2S^x\. 

If original data were used rather than deviations, four normal equations 
would be required (as shown on page 747) ; in such a case it would prob- 
ably be advisable to use the Doolittle method of simultaneous solution, 
which was described on pp. 716-720. Inserting the appropriate values 
(found in Table 175) in the above three equations, we have: 

II, 118.840 = 61.4126i2.34 + 6.7376i3 24 + 505.5756i4.23; 

III. 45.709 = 6.737612 34 + 15.5146i 3 24 - 63.3706i4 23 ; 

IV 1,114.847 - 505.5756i2 34 - 63.3706i3 24 + 19,843.5206i4 23 . 

These equations solved simultaneously 3 deld the estimating equation 

xci 234 = 1 445402x2 + 2.429389x3 + ,02711406x4. 

But 

234 = Xi — 612 34 -V 2 — 613 24.X'3 — 614 23^4 

= 15.85 - (1.445402)(29,505) - (2.429389) (50.66339) 

- (.02711406) (100) 

= - 152 589 
Therefore 


Zci 234 - -152.589 + 1 . 445402 X 2 + 2.429389X3 + .02711406X4. 

The variation of the computed values is 

'Sxci .234 = 612 342xiX 2 + 6i3,242xiX3 "h 6i4.232XiX4 
= (1.445402) (118.840) + (2.429389) (45.709) 

+ (.02711406)(1, 114.847] 

= 313,044. 

The other measures of relationship now are 

Sa;! - Sa:|i.234 383,685 - 313.044 70.641 

(T.i m = ^ ^ = = 3.924. 

(TSI 234 — 1.981. 


^ Sa:gx.234 ^ 313.044 
Zxf 383.685 


.8159. 


^1.234 — ,9033, 
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The coefficient of correlation has become progressively larger as the 
number of variables has been increased. Thus 



As explained earlier, neither the coefficients of correlation nor the coeffi- 
cients of determination are additive to produce the higher multiple meas- 
ures, on account of the duplication of elements involved. The coefficients 
of correlation become larger and the standard errors of estimate become 
smaller as more factors are added, because the explained variance becomes 
larger and the unexplained variance becomes smaller. The square root 
of the unexplained variance is, of course, as- The gradual reduction of (Ts 
as more factors are introduced is shown below. 



crsi .2 = 2.922 (Tsi.b = 3 719 crsi .4 = 4.124 0 - 51.2 == 2.922 


Chart 237, of the now familiar type, shows the deviations in suicide 
rates explained by our three independent variables and shows also the 
remaining unexplained deviations. The addition of the economic factor 
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appears, on the whole, to have improved our estimates very little. Al- 
though the error in the New York City estimate is considerably reduced, 
several of the others have been increased somewhat, and the discrepancy 
in the case of California is nearly as great as before. Apparently the 
peculiar factors affecting the California suicide rate have not been included. 

Three independent variables: partial correlation. Proceeding now in 
the usual fashion, we obtain partial correlation coefficients as follows : 

^12.34 
n2 34 
?^ 13.24 
ri3.24 

2 = 234 - 2xfi 23 ^ 313.044 - 302.217 ^ 10.827 ^ ..o^q 

Xxl - S4i 23 383.685 - 302.217 81.468 ‘ 

ri4.23 = +.3646. 

It might be thought that, as additional factors are held constant, the 
dependent variable would be progressively less closely associated with a 
given independent variable. For instance, the correlation between sui- 
cides Xi and business failures X 4 was found to be ru = +.4040; but^ 
when the age factor X 2 was also brought into the picture (technically, 
when suicide rates and business failure index numbers were each adjusted 
for variations in average age), we had ri 4.2 ~ +.0868. What appeared 
to be a relationship between business failures and suicides was in fact 
largely a relationship between average age and suicide rates. On the 
other hand, ris == +.5925 increased to ns .2 = +.6856 when age X 2 was 
taken into consideration. In this case the average age had varied in the 
different regions in such a way as to obscure the co-variation of per cent 
male X 3 and suicide rate Xi. 

The reader should not necessarily conclude from these measures that 
differences in mental traits attributable to age and sex make for sus- 
ceptibility to the urge of self-destruction. It may well be that older 
persons and males are more liable to find themselves confronted with 
situations leading to despondency. Thus financial worries have their first 
incidence on the chief breadwinner of the family, usually a mature male. 
Also, certain diseases of old age may partially account for the higher 
suicide rates among older persons. "Whatever the conditions leading to 
suicide, it would appear that, taken together, they are fairly constant from 
region to region, but that they vary in their incidence with age and with 


= 234 - 34 ^ 313.044 - 221.179 ^ 91.865 

T.xf - 2x§i 34 383.685 - 221.179 162.506 

= +.7519. 

_ 24i 234 - 24 _ 313.044 - 231.159 ^ 81.885 

2xi - Sxgi 24 383.685 - 231.159 152.526 

= + .7327. 
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the proportion of males in the population. There are some exceptions to 
this statement, notably Califorma. The introduction of more variables 
is needed to improve the accuracy of our estimates for this and some of the 
other regions, and so to increase the magnitude of our multiple coefficient 
of correlation. 

Another Approach to Multiple and Partial Correlation 
Partial coefficients. The fact that partial correlation coefficients some- 
times become higher and sometimes lower as more variables are held con- 
stant may be more clearly understood when we learn how the coefficients 
of higher order may be derived from those of lower order. The values of 
the coefficients of zero order for the suicide study we have determined to 
be: 

ri 2 = 4-. 7742; nz = +.5925; m = +.4040; 

r23 = +.2183; r24 = +.4579; 

r34 = -.1142. 


As previously stated, they are called zero order coefficients because no 
variables are held constant. From these zero order coefficients, first order 
coefficients, with one variable held constant, may be computed. Below 
are given the formulae for the nine possible coefficients which may be 
computed for this problem, together with the substitutions and results 
(Three others, r 23 i, r 24 i, and r 34 .i, have not been included, since they 
hold Xi constant and do not concern this problem.) Strictly speaking, 
only six of these coefficients are required for further computations — either 
the fir.'?t six or the last six, although for checking purposes all nine may be 
desired.® 


ri2.4 = 


ns 4 = 


rzs 4 


?' 12.3 = 


n4.3 = 


ri2 - (ru)(r2i) .7742 - (.4040) (.4579) 


(.9148) (.8890) 

.5925 - (.4040)(-.1142) 
(.9148) (.9935) 

ns - (r24)(r34) .2183 - (.4579) (-.1142) 


Vl — ri4 Vl — r24 

ri3 — (ri4)(r34) 

Vl - ri4 Vl - r34 


= +.7245; 


= +.7026; 


Vl — r|4 Vl — r34 

n 2 - (ns) (ns) 

Vl — rfs 




+.3064; 


+.8203; 


ns 

n4 - (ri3)(n4) _ 

Vl - rfa Vl - 


(.8890) (.9935) 

.7742 - (.5925) (.2813) ^ 
(.8056) (.9759) 


.4040 - (.5925)(-.1142) _ 

(.8056) (.9935) " +-5»J^, 


* Proof that these formulae are the equivalent of those we have been using is given 
in Appendix B, sec tion XX IY-2« The labor of eomputation can b e mater ially shorts 
eaed if values of Vi — looked uo in J, E. Miner, Tables of V 1 — and 1 
for Uee in Fartial Correlaiion and Trigonometry^ Johns Hopkins Press, Baltimore, 1922. 
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r24 3 = 


ri3 2 = 


ri4.2 


r34 2 = 


r2i - ir23)\rzi) ,4579 - (.2183) (-.1142) 


Vl - ris Vl - 

riz — {ri2){r2z) 


rli 


(.9759) (.9935) 


= +.4979; 


vr - ri2 Vl - rlz 

ri4 (ri2)(r24) 

Vl - rli Vl - rli 

?’34 — (t2z)(t24^ 


.5925 - (.7742) (.2183) _ , 
(.6329) (.9759) 


.4040 - (.7742) (.4579) 
(.6329) (.8890) 


= -{-.0880j 


vr 


2 

r23 


vr 


^24 


-.1142 - (.2183) (.4579) 
(.9759) (.8890) 


== -.2469. 


Six of these coefficients, those which correlate suicide rates with another 
variable, have previously been computed. Except for slight discrepancies 
in the fourth digit due to rounding, the two sets of results are in agreement. 

For a four-variable problem there are three second order coefficients 
involving Xi as the dependent variable. These may be computed: 


?'12.34 


?*13.24 


ri2 4 - (ri3 4)(y’23 4) 


Vl 


,Vi- -2 


J-13 4 V 1 - fas 4 
>•13 4 — (ri2 4)(?'23 4) 


vrr 


J'12 4 


T2Z 4 


»'14.23 = 


?'14 3 


Vl 

{tiZ 3)(?'24 3) 


.7245 - (.7026) (.3064) 
(.7115) (.9519) 

.7026 - (7245) (.3064) 
(.6893) (.9519) 


= +.7518; 


= +.7325; 


Vl 


'Az 3 


vT 


?’24 3 


.5893 - (.8203) (.4979) ^ 


(.5719) (.8672) 


The above fommlae employ the first six of the first order coefficients. If 
desired, the following formulae, using the last six coefficients, may be 
employed as a check. 

ri 2 3 - (ri4.3)(r24 3 ) _ .8023 - (.5893) (.4979) 


ri 2 34 = 


- rl 4 3 V 1 - r24 3 
ri3 2 - (ri4 2)(r34 2 ) 


..2 ,vT^ 


ri3.24 


ri4 23 = 


2 

r34 2 


Vl - ri4 2 VT 
ri4 2 — (ri3.2)(r.34.2) 

VI 


?-f3.2 


vT 


r|4.2 


(.8079) (.8672) 

.6857 - (.0880) (-.2469) 
(9961)09690) 

.0880 - (.6857) (-.2469) 
(.7279) (.9690) 


= +.7521. 


= +.7329, 


= -[-.3648,, 


Again we ffiid agreement, to three digits, with the same measures com- 
puted by the other method. If there are five variables^ the formula for 
ri2.345 is 

ri2.45 ~ (ri3 45)(r23 45 ) 


ri2.345 = 


vr^ 

ri2.35 




vT 


or 


ri3.45 


(ri4 35)(r24 35 ) 


or 


Vl - 

rn.U 


ri4 35 


Vl 

(ns u)ir2BM) 




Vl 


Ti5 34 


vr 


2 

r25 34 
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If che reader has followed the exposition thus far, he will easily be able to 
construct formulae for the other third order coefLcients, and also for those 
of higher order. 

(Ts and R. It is possible now to obtain other measures of relationship 
by use of the diSerent coefficients of partial correlation. For four vari- 
ables the formulas and their application to the problem at hand are;® 

~2r/i _ rnifl — rjs 4)(1 — ri 2 34 )] 


2 Sxf[(l 

Cfsl 234 = 


N 


(383 65) (.8368) (.5063) (.4348) __ (383.65) (.1842) 


18 


18 


= 3.926. 


C^iSl.234 ~ 1.981. 

i^i.234 = 1 -- [(1 — ri4)(l — y’ls 4)(1 — ^’ 12 . 34 )] — 1 — .1842 == .8158. 
•Rl 234 == .9032. 

It wiil be remembered from simple correlation that one approach consid- 
ered r as the slope of the estimating line in terms of the standard devia- 
tion of the different variables. In terms of symbols used in this chapter. 


ri 2 = 5 i 2 -t- and 612 = r ^2 

<72 <72 


But 


Therefore 


CTi = 


P’S! 2 


<72 == 


<7s2 1 1 2 2 

7 and 712 ~ r 2 v 


Vl - 


21 _ P'S! 2 
<72 <7s2.1 


and substituting we find that 

712 




P‘ 32,1 


PS2.1 


By analogy, then, we have 


■L . <^-51 234 ^ < 7 ^ 1.234 

712.34 = 012,34 Z , aUCl 012 34 — 7i2.34 • 

< 752.134 < 752.134 

To obtain < 752 . 134 , it is convenient to use the formula 

2 Srr|(l — 7|4)(1 ““ 723 4)(1 — rii 34 ) 

< 752.134 = ^ j 


- (61.4119)(.79Q3)(.9061) (.4348) 


18 


^ 1.062. 


<752 134 == 1.031. 


« For an explanation of these formulae, see Appendix B, section XXIV-3. 
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Proceeding, we find that 


bi2.34 - .7518 ^ == 1.444. 

As in our earlier illustrations, there is a slight discrepancy in the fourth digit 
between this and the previously described method, due to rounding. The 
other coefficients of estimation may be obtained by substituting in the 
formulae 

t. _ <7’S1.234. 

013.24 ~ n3.24 Z ’ 

0^33.124 

^ O’Sl.234 

0l4.23 ~ ri4 23 

0'*S4.123 

Although several variations of the formulae for the different standard 
errors are possible, the following are convenient in that they require 
only the correlation coefficients given on pages 770-771 : 

^2 _ Sa|(l — — ?234)(i ^1324). 

53.124 — , 

O'M 123 = ^ 4 )^^ ^243)(^^ ^ ^ 1423 ) . 


Other Measures of the Individual Importance of the Independent Variables 

It will be recognized that the methods of obtaining partial correlation coefficients 
which have been described are very laborious, since they necessitate either the solution 
of three extra sets of normal equations that have no function other than to obtain the 
values '^x%i.2z, 2a: ci 24, and 'Zx^ 34, or the building up of various simple and partial 
correlation coefficients which likewise may be of no direct interest. Consequently 
other measures of the importance of the individual factors are frequently used instead, 
which are much easier to compute. 

Perhaps the most common of these are the beta coefficients: 

^12.34, ^13-24, and 014 23. 

The reader should not confuse these measures with 0 i and 02 , which were used to de- 
scribe a frequency distribution. The two sets of measures are entirely different in 
Bature. It wdl be recalled that the following relationship obtained in simple correlation: 

ri 2 « O12 

Vi 

Reasoning by analogy, we have: 

012.34 — 5i2.34 
018.24 = 613.24 ~ 

CTi' 

014,23 ** 6 i 4.23™ 

<Ti 
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The analogy is imperfect, since the standard deviations T^ed with the net coefficient of 
estimation are gross measures of dispersion; that is, the variables have not been adjusted 
for variations in the other factors which have been held constant statistically. The 
four standard deviations are readily found They are: ai —4 617; €72 = 1.847, 
<73 = .9284; (74 = 33.20. Substituting in the above formulae, we find: 

|9i2 34 = 1.445|P= +.578; 

9284 

/3i3 24 = 2.429^ = +.488; 

33 20 

^14.28 = .02711 ^ = +.195. 

The rank of the /5 coefficients in this case is the same as the partial coefficients. This 
will usually, though not always, be the case. Pe^: cent male seems somewhat less im- 
portant by this method, however 

Two other measures of individual importance are sometimes used. The coefficients 
of separate determination split up the expression for i2i.234 as used on page 766; hence 
we have 




SaJcri 234 hi2 Z4SX1X2 + 5i3 2^xiXz + 614 232a:ia;4 


spht into three components: 

,2 ^12 ZiLxiXz ,2 hxz,242‘XiXz j2 614 2Z^XlX^ 

di2.34 = 2^1 — j ?4 = — ; rfl4.28 == 2^2 

The sum of three coefficients of separate determination, therefore, equals the coefficient 
of multiple determination. These separate coefficients, however, are thought to be more 
subject to random error than the coefficients; furthermore, each includes part of the 
joint determination of the other two independent variables. Another disadvantage of 
this coefficient is that the value of may be negative and thus a coefficient of separate 
correlation d cannot be obtained, (See Mordecai Ezekiel, Methods of Correlation An- 
alysiSj pp 380-383, John Wiley and Sons, New York, 1930.) Another measure of 
individual importance, not widely used as yet but which Ezekiel recommends, is the 
coefficient of part correlation. This coefficient measures the correlation between an 
independent variable and the dependent variable, the latter only having been adjusted 
for net variations in the other independent variables. Perhaps the relationship between 
multiple, partial, and part correlation will be clearer if we think of them as follows (in 
terms of a +variable problem) : 

Multiple correlation, Si. 234 = simple correlation gf 

with Xci 234 

Partial correlation: 712.34 = simple correlation of 

[X 2 &23.4 Ys — 624f3Y4] With [Yi — 6l3,4 Y3 — 614.3Y4]. 

Part correlation: 12734 = simple correlation of 

Y 2 with [Xl — 6 i 3.24 Y 3 — & 14*23 Y 4 ]. 

See also Ezekiel, ibid., pp. 181-183 
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Estimate of Correlation in the Population 

As is the case with simple linear or non-linear, so with multiple or par- 
tial correlation it may be desirable to estimate the correlation that exists 
in the population. The formula to use for multiple correlation is identical 
with the one with which we are familiar; 

234 . . (iV — 1) — (m — 1) 

■^1 234 t . .m ■j.j. , 

N — m ^ 

where m is the number of degrees of freedom lost; that is, the number of 
constants in the estimating equation, including ai 234 • . . . In the present 
instance we have for the multiple correlation coefficient) 

.8159(18 - 1) ~ (4 ~ 1) 

234 = "4 .77b4. 

-Si 234 = .8811. 

For the partial correlation coefficient (fi 4 23 , for example), we have a 

slightly different expression 

-2 _ ri4.23 ■ . . (iV — OT + 1) - 1 


Applying this formula, we have 
.1329(18 


-2 

ri4.23 = 


4 + 1) 


18-4 


.0710. 


ri4 23 = +.2665. 


Reliability of Coefficients 

Standard error of coefficients. Measures sometimes used to test the 
reliability of multiple and partial correlation coefficients are analogous to 
the formula for Ur employing the coefficient obtained from the sample, 

1 - 

VN - ni 


For multiple correlation this may be stated 


^^ 1.234 


Ri 


234 ... in 


VN -m ’ 


7 TMs expression develops from tbe relationship 


- -2 

1 — ri423 ~ 


1 — R f 234 


as shown in Appendix B, Section XXIV~4. 
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and for partial correlation® 

34 


1 — ri2.34 


- 


m 


As previously noted on page 681, such expressions are grossly inaccurate 
when N is small, or when the value of r is large. An additional limitation 
to cTjg is that the sampling distribution of R varies with the magnitude of m. 
It is therefore preferable to make use of more exact methods, such as the 
analysis of variance. 

Analysis of variance. Let us summarize in tabular form some of the 
major results of our correlation analysis. AU the needed data concerning 
variation will be found from the following expressions, which were used in 
computing various correlation coefficients and all of which have been given 
on preceding pages: 


As 2 

S?23 
ri4 23 
Rl 234 


- 5994 

383.685 

ri2 = 

+.7742 

- 4700 

153.714 * 

ri3.2 = 

+.6856 

302.217 

383.685 “ 

Rl 23 = 

.8875 

10.827 

81.468 = 

7*14.23 — 

+.3646 

383.685 

i2i,234 — 

.9033 


Source of variation in Xi 

Amount of 
variation 

Degrees of 
freedom 

Variance 

Gross amount explained by X 2 . . . 

229 971 

1 

229 971 

Increment explained by addition of X 3 

72 246 

1 

72 246 

Total explained by X 2 and Xs . 

302 217 

2 

151 108 

Increment explained by addition of X 4 

10.827 

1 

10 827 

Total explained by X 2 , Xs, and X 4 . 

313 044 

3 

104.348 

Residual, unexplained by X 2 , X 3 , and X 4 

70 641 

14 

5.046 

Total .... 

383 685 

17 

22.570 


8 If the population value of ri2.34 ... m is used in the formula, this expression is 


Vi\r — m -}- 1 


The distribxition of such sample coefficients may be considered to approximate nor- 
mality only when the population coefficient is small and N is large. 
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We may now test the significance of Ri 234 by the use of 


104 348 
5.046 


20.679. 


Appendix G2 indicates that for the .001 level of significance, when ni == 3 
and n 2 = 14, F should equal 9.730. The results indicate that J2i.234 is 
significant. 

If we wish to test nz 2 , it is best to relate the increment in variance 
attributable to variable 3 to the variance which is attributable to chance, 
the latter being the variance not accounted for by all the independent 
variables taken together. Thus we have 


72.246 

5.046 


14.317. 


When ni — 1 and n 2 == 14, F should equal 8.862 for the .01 level of sig- 
nificance, and 17.143 for the .001 level. Clearly ris 2 is significant. No- 
tice that the unexplained variance is derived from 70.641 rather than 
from 383.685 — 302.217 = 81.468. The latter quantity is not the varia- 
tion due to chance factors, but is the variation due to chance factors plus 
variable 4. 

If we had not made use of variable 4 in our correlation analysis, we 
should have used, for the unexplained variance, 5.4313 — 81.468 -r 15. 
We should then have computed 


72.246 

5.4313 


13.302. 


We now have a smaller value for F; however, this is partially offset by 
the fact that n 2 is 15 instead of 14, and F need not be so high for the same 
level of significance. In the present case it is not obvious whether it is 
more accurate to use for the unexplained variance that which remains 
after employing variables 2, 3, and 4, or that which remains after using 
variables 2 and 3 only, since the F test, as illustrated in the following para- 
graph, fails to show that the additional explanation attributed to variable 4 
is significant. In general, however, the test using fewer independent vari- 
ables is not so accurate as the test using more, and the former may errone- 
ously fail to show significance for the factor being tested. 

To test the significance of ri 4 . 23 , we compute 




10.827 
5 046 


= 2.146. 


The F table for P ^ .05, when ni = I and 712 = 14, requires that F “ 4.600. 
Therefore, ri 4.23 cannot be regarded as significant. 

The significance of a partial correlation coefficient may also be tested 
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by use of the t table, and the conclusions from such a test agree with the 
analysis of variance test. We compute 

j _ 1'12 3 ■ — VI , 

- >12 3 . . .m 

Using the t test to ascertain the significance of ri3.2, we have 

Vl - .4700 

which, according to the t table, lies beyond the .01 level of significance 
and is in agreement with the F test. It is interesting to note that the 
value of t is the square root of the value of F, when m = 1. 

We may also transform a partial correlation coefficient into Z in order 
to discover if the coefficient differs significantly from some known or hyp)o- 
thetical population value, or from somq other observed correlation coeffi- 
cient. The general formula is 

Z = 1.15129 logio 

1 — r 

with standard error 

^ _ 1 
Vn 


Multiple Curvilinear Correlation 

Transformation to linear form. As was found true with gross relation- 
ships, so the net relationship between a dependent variable and one or 
more independent variables is sometimes non-linear. Sometime^ it is 
possible to reduce such non-linear relationships to linear form by using 
logarithms or reciprocals (or possibly some other function) of one or more 
variables. Thus, with three variables, we might have an estimating equa-' 
tion of one of the following types: 


Xci,2S = <^1,23 + hl2.3log X2 + 613 2X35 

Xci.2Z = <3^1.23 + &12.3log X 2 + &13 2^X3,* 


Xci 23 = ^1.23 + &12.3l0g X 2 &13.2 


'X3 


log Xci.23 == log (1^1.23 + X 2 log 612.3 + &I3.2; 

log Xci.23 = log ai.23 + X2 log 612 3+^3 log 613.2; 
log log Xci 23 log ai,23 + X2 log 612.3 + Xz log 613.2. 


The above types are, of course, but six of a number of possible combina- 
tions, a proper choice among which should, in perhap? a majority of eases, 
yesult in empirical curt’-es satisfactory for purposes of estimatiop. It 
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would not, however, be possible to transform into a linear equation a re- 
lationship in which the logs of the dependent variable were related to the 
second variable and the reciprocals of the dependent variable were related 
to the third variable. 

Use of polynomials. Even more flexible is the use of polynomials, and 
probably more useful as an exploratory tool in that it is not necessary to 
have a very precise hypothesis concerning the nature of the relationship 
among the variables before midertaking the correlation. Thus we might, 
using three variables, start vnth the equation type 

23 = Cli 23 + hi2.zX2 4" &l3 2 X 3 ; 


then using degrees of freedom compute estimates of explained variance 
and of unexplained variance 




A - 3 


, and test for significance. 


using the F ov z table. 

Then we could include also the squares of X2 in our equation, in this 
fashion; 


A’cI. 22^3 = ni.22'3 + bl 2 2^X2 + bi 2 23A2 + hi 3 22 ^ Xz, 


To test whether the use of the second powers of X2 has significantly re- 
duced the variance, we should relate the increase in the explained variance 


'^xci 22 z 


Srci 23 


to the variance that is still unexplained 


Sr|i 22^3 

Ar-4 * 


If 


the test indicates that the reduction in variance is significant, we should 
conclude that it is worth while to use the additional constant 612' .23- In 
similar fashion we could utilize the squares of X3, or higher powers of 
both X2 and X3. 

Of course, it is not always necessary to go through all the labor suggested 
in the preceding paragraph. The statistician can frequently decide on 
economic or other non-mathematical grounds the type of relationship 
which exists among the variables. Thus economists are of the opmion 
that demand curves usually slope downward to the right, and are concave 
upward. This would indicate that the second powers of the price series 
should be used if the response of purchasers to various prices are to be 
estimated by the use of simple polynomials. In the illustration that fol- 
lows, millings of wheat (Xi) are to be correlated with the price of flour (X2) 
and index numbers of income of industrial workers (X3). The data were 
assembled by the Bureau of Agricultural Economics of the United States 
Department of Agriculture;^ and the net relationship between Xi and 


® Instead of adopting the usual practice, when correlating time series, of expressing 
the data as percentages of trend, We are following the procedure used by the Bureau 
of Agricultural Economics in analyzing this problem. The Bureau did not adjust foi 
trend, apparently on the assumption that the trends were approximately horizontal, 
and that the period was too short to permit of accurate trend measurement. 
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J 2 , with Xs held constant, was estimated by this Bureau in order to aid 
in determining the effect of a processing tax on wheat acreage. The pre~ 
sumption was that, if it could be shown that the demand for flour, and 
hence for wheat, by millers was inelastic (only slightly affected by the 
price of flour), farmers would not find it necessary to reduce materially 
their wheat acreage. 


TABLE 176 

Wheat Milled, Price op Flour, and Index Numbers of Income op Industrial 
Workers, by Years, 1924-1935 


Year 

begmnmg 

July 1 

Wheat milled 
(millions of 
bushels) 

Xi 

Average price of 
flour per barrel 
(dollars) 

X 2 

Income of industrial 
workers 

(1924-1929 = 100) 
X 3 

1924 

475 

7.94 

94 0 

1925 

490 

8 36 

100.5 

1926 

493 

7.42 

101.8 

1927 

494 

7 36 

98.5 

1928 

500 1 

6.29 

102 6 

1929 

496 1 

6.48 

100.8 

1930 

481 I 

4.78 

77.3 

1931 

474 

3.84 

56.8 

1932 

481 

3.86 

40 6 

1933 

435 i 

6.47 

54.7 

1934 

443 1 

6.66 

60.9 

1935 1 

1 

460 

6.78 

68 8 

Total 

5,722 

76.24 

957.3 


Source U’lited States Department of Agriculture, Boreau of Agriculture! Feo’^omie*: Published in 
pamphlet of United Stues Trea&urv Department, An Aralifi i of tus Eftct ‘ >' Pimr. .i - ^ Taxa, Leiied 
Uvder U £’ Ifirccu * mil Adj Act, 1' 37, p S-t Wheat milled is that n'dled for uomestic consumption 

Average price is a siiride average of winter w'heat straights Kanse** Citv, and spring wheat family patents, 
Minneapolis iT'dca. numbers of mcoine are for year beginning June 1. 


The data are shown in Table 170, and the gross relationship between 
Xi and X 2 , and between Xi and X 3 , may be noted by reference to Chart 
238. Although the milling of wheat seems to be directly related to income 
of industrial workers, and the relationship is apparently linear, scarcely 
any relationship is discernible between wheat milled and the price of flour- 
But using our knowledge of the usual shape of demand curves, we may 
hypothecate the following type of relationship: 

Xci, 22 ^s ^ + 2)12,2^3X2 + fcl2^.23X| + ?>13.22^Xs* 
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The norma] equations required are : 

I. SA"! == Ndi 22 Z “ 1 ” ^12 2 Z ^^2 "f* ^ 12 ^ . 2 Z^^\ “f" &1S 22 ^^Z\ 

II. SX1X2 = Ui. 22 ^ 3 SX 2 + bi 2 2^3SX| + ^12^23^X2 “h 6 l 3 22^2X2X3; 

III. XXlXl = ai 22'32Xi + bi2 2'3SX| + bn' 23SZI + 6i3.22'SA|X3; 

IV. SX1X3 = Ol 22^32X3 + &12 2^32X2X3 + i>l 2 ^. 232 X 2 X 3 + & 13 . 22 ^ 2 X 3 . 
A computation table of the various sums and product sums will not be 

shown, since no new principle is involved, but on page 782 is shown a 
ciieck on the accuracy of the computations. To find a particular product 
sum, we locate the multiplicand in the stub and read across to the ap- 


WHEAT MILLED X, WHEAT MILLED X, 

MILLIONS OF BUSHELS MILLIONS OF BUSHELS 



PRICE OF FLOUR X^ 
DOLLARS PER BARREL 


INCOME OF INDUSTRIAL WORKERS Xj 
INDEX NUMBERS. 1924 - 1929=100 


Chart 238. Scatter Diagrams of Gross Relationship of Price of Floirr and Income 
of Industrial Workers with Amount of Wheat Milled^ by Years, 1924-1935. (Data of 
Table 176.) 


propriate multiplier column. Thus it will be seen that the value of SXlXs, 
which is 43,544.6876, is found at the intersection of row Xl and column 
X 3 . The dotted lines indicate the items that are totaled to check witli 
the Xs column. 

From the computations on page 782 the normal equations are formed: 

I. 6,722 « 12ai 22's + 76.246i2 2 'z + 508 39226i2'.23 + 957 3613.22'. 

II. 36,380 51 » 76 24ai 22^3 + 50S.39226i2.2'3 + 3,514.27126i2' 23 + 6,335 6836i3.22' 

III 242,854 1787 = 508 3922a i,22'3 + 3,514 27126i2 2^3 + 24,947 4446i2'.23 4- 43,544 6S765i3 22'. 

IV. 460,092 5 « 957.3ai 22^3 + 6,335 6S36t2 2'3 + 43,544.68766i2' 23 + 81,973.37bi3 22'. 

Solving the normal equations simultaneously gives the estimating 
equation : 

Xci. 22'3 = 558.4216 - 48.34186X2 + 3.145498X| + 1.156772X3. 




81,973.37 591,946.241 
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The other measures of relationship are now easily found in the usual 
fashion. For the explained sums of squares we have 

SX^i.22'3 = oi. 22 ' 3 SZi + 6i2.2'32ZiZ 2 + &12' ssSXiXl + 613 as'SXiXs 

= 2,732,705. 

The tot^l sums of squares is = 2,733,298. The usual correction fac- 
tor is XiSZi = 2,728,440. From these values we may compute ths 
different measures of variation : 


Total . St! - SXf - ZiSZi « 2,733,298 - 2,728,440 =» 4,858 

Explained Xxh 22 z ^ SXli 22^3 - Xi'LXi = 2,732,705 - 2,728,440 4,265 

Ilnexplained . 2a:|i 22'3 = SZ? — SXli.Ws = 


We therefore have 
cr^i 22 3 = 

tdS / 

Jtti 22 z == 


Srr|i 22^3 

593 

N 

12 

Sxci 22'3 

4,265 

Sif 

4,858 


49.52, and crsi 22 z 
= .8879, and Ri 22 z 


7.03. 

.937. 


Ri 23 , computation of which is not shown, is .889. Although the non- 
linear correlation is very high, it must be remembered that we had only 
twelve observations and our four constants, a\ % 2 Zj hi 2 , 2 z, ?>l 2 ^ 23 , and 
bi 3 . 22 'i have used up four degrees of freedom, so that we have only eight 
degrees of freedom left. Furthermore, Chart 239 shows that the mathe- 
matical equation of the net relationship between the price of flour and the 
amount of wheat milled, when income of industrial workers is held con- 
stant, is not strictly logical, for the solid curve on this chart turns up 
after a price of about $7.50 per bushel is reached. The equation of this 
curve of net relationship is 


Zci. 22 '( 3 ) = 650.704 - 48.34186Z2 + 3.145498Z|, 

in which Xci 22 ^( 3 ) refers to a value of variable Zi (wheat millings) as 
estimated from the curvilinear relationship mth variable Z 2 (price of 
flour), after allowing for the effect of variations in Z 3 (income of industrial 
workers). The symbol ( 3 ) indicates that Z 3 has been held constant 
statistically. The net estimating equation was obtained by substituting 
79.775 (i.e., Z 3 ) for Z 3 in our original estimating equation. The broken 
line on this chart, which represents a more logical relationship, was ob- 
tained by a graphic process which will be described in the fimial section o 1 
this chapter. 

Before leaving this section, however, it is worth noting that it is possible 
to use reciprocals or logarithms of some of the variables and at the same 
time utilize the higher powers of any or all of the independent variables, 
whether they are in original form, or transformed into reciprocals or loga- 
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rithms. This possibility, of course, materially increases the flexibility of 
this approach. 

A graphic approach. Statisticians in the United States Department of 
Agriculture have developed an extremely flexible technique by which 
curves of net relationship and a coeifleient of multiple correlation may be 
obtained through successive approximations by means of charts and mathe- 
matics no more advanced than simple 
arithmetic. While this method has dis- 
tinct limitations, it is useful as an explo- 
ratory tool in determining the appropriate 
type of equation to fit by mathematical 
methods. 

Chart 240 contains ordinary scatter 
diagrams of the relationship between price 
of flour and amount of wheat milled, bur 
with additional information included also. 
Remembering that a curve of net rela- 
tionship is supposed to represent the rela-. 
tionship between two variables when cer- 
tain other variables are held constant, we 
can obtain a first approximation to the 
net relationship between the milling of 
wheat and the price of flour by proceeding 
as follows: First, we note from Table 176 
that in the years 1925-1929 inclusive, the 
income of industrial workers was sub- 
stantially constant. Those years are in- 
dicated on section A of Chart 240 by 
black triangles. Through these triangles 
a broken line is fitted freehand. The 
level of industrial income, though by 
no means constant between 1930 and 
1935, was at a distinctly lower level than for the earlier period. Conse- 
quently, these observations are shown by black dots, and a second broken 
line is fitted freehand to them. The year 1924 was intermediate between 
these groups with respect to industrial income, and is shown by a hollow 
triangle. Had there been other years falling naturally into a group with 
1924, we should have fitted a third line. Using our two broken lines as 
guides, we have drawn in the solid curved line, which is our first approxima 
tion (line I) of the net functional relationship between Xi and X 2 . The 
reason the line is drawn as a curve is partly because this is the type of 
relationship we should logically expect, but mainly because our chart haa 


WHEAT MJLLEO X, 
MILLIOMS OF BUSHELS 



Chart 239. Least Squares 
Estimate and Graphic Estimate 
of Net Relationship Between 
Price of Flour and Amount of 
Wheat Milled, (The dots are 
placed on the chart to represent 
deviations from the mathemati- 
cally fitted curve. Derived from 
data of Table 176.) 
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two broken lines, the steeper one passing through the dots (which are 
mainly on the left-hand side of the chart) and the flatter one passing 
through the triangles (which are mainly on the right-hand side of the chart). 
It is not necessary that the first approximation be a good fit. Successive 
approximations, if made with skill, will correct the original discrepancies; 
however, the more accurate the first approximation is made, the less labo- 
rious is the completion of the procedure. 

A slightly different technique is shown in section B of Chart 240. It 
is not always possible to divide the data into natural groups; but we can 
connect dots representing the different observations in the sequence of 


WHEAT MULED X, 
MILLIONS OF BUSHELS 



PRICE OF FLOUR 
DOLLARS PER BARREL 


A 


WHEAT milled X, 

millions of bushels 



Chart 240, First Approximation to Net Non-Linear Relationship Between Price of 
Flour ( Ja) and Amount of Wheat Milled (Xi), as Obtained by Two Difierent Techniques. 


their rank with respect to the third variable. Thus the various years 
rank from lowest to highest, with respect to income of industrial workers, 
as folWs: 1932; 1933; 1931; 1934; 1935; 1930; 1924; 1927; 1925; 1929; 
1926; 1928. Dots representing those years have been connected in that 
order by broken lines. Wherever large gaps occur with respect to values of 
Xs, the broken lines may be omitted. Thus we might not have com 
nected 1924 with either 1930 or 1926. The broken lines in this chart arc 
intended as a guide to drawing in the first approximation line. With the 
exception of the lines connecting 1934 and 1935, and 1924 and 1927, the 
picture is clear. The solid line (I) is identical with that of section A 
Had there been four variables, we should have connected by dotted lines 
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only those observations which were approximately the same wth respect 
to the variables Zs and X4. A scatter diagram of these two variables 
would have aided in locating such observations. Had there been also a 
fifth variable, the number of preliminary guide lines would have been re- 
stricted to observations similar with respect to Xs, Z4, and X5. As the 
number of variables is increased, there are fewer and fewer dots that can 
be connected, and these dots are more and more difficult to find. 

Still another method of obtaining a first approximation line is to use 

the equation obtained by the ordinary 
mathematical multiple correlation 
approach. 

The vertical distances between the 
different observations and the solid 
line (I) of Chart 240 represent the 
individual variations in Xi that have 
not been explained by the different 
values of Z2. These values are now 
read from the chart and plotted against 
the values of X3. Thus line a of Chart 
241 is the same length as line a of Chart 
240. It is apparent that a straight line 
fits these residuals very well, except for 
the 1932 observation. A straight line 
is therefore drawn in by inspection as 
a first approximation (line II) of the re- 
lationship between wheat milled (ad- 
justed for the price of flour) and income 
of industrial workers. 

Deviations from the curve of Chart 
241 are now plotted above and be- 
low the first approximation curve (I) 
of the net relationship between Xi 
and X2. This is done on Chart 242; the first approximation curve (I) is 
shown here as a thin solid line. Note that line b of Chart 242 is the 
same length as line b of Chart 241. It is apparent that our first approxi- 
mation curve (I) needs to be corrected. The heavy line represents a sec- 
ond approximation (I'). 

A further (and in this case, final) step is to plot deviations from the 
second approximation curve (T) of the net relationship of Xi to X3 above 
and below the first approximation curve (II) of the net relationship be- 
tween wheat milled and income of industrial workers. This is done in 
Chart .243. Note this time that line c of Chart 243 is the same length as 



Chart 241, First Approximation to 
Net Relationship Between Income of 
Industrial Workers (J 3 ) and Devia- 
tions from the Mean in Amount of 
Wheat Milled. (Vertical deviations 
from zero on this chart are deviations 
from curve I of Chart 240 A or B ) 


Chap. 24 ] l^^ULTIPLE AND PARTIAL CORRELATION 


787 


line c of Chart 242. Although the 1932 observation is still too far off, 
there is not adequate reason for changing the shape of the first approxi- 
mation curve, and so IP is the same as II. Had this curve been changed, 
it would have been necessary to proceed to obtain third approximation 
curves. It should be noted that the scatter about the estimating lines 
becomes smaller and smaller as the estimated effect of additional variables 
is removed, or as successive approximations 
of any relationship are made. The scatter 
can, of course, be reduced to zero if ex- 
tremely complex curves are used; how far 
to go is a matter of judgment. 

The final lines of relationship in these 
charts are not least square fits. Such lines 
cannot be obtained by purely graphic meth- 
ods, although it is possible to so draw the 
curves on each chart that the deviations 
will total zero. The additional labor in- 
volved in adjusting the curves to do this 
exactly is probably not justified; it can be 
done by inspection with sufficient accuracy 
for most purposes. Reference to Chart 239 
indicates that the average height of the 
curves determined by the two methods, 
mathematical and graphical, is about the 
same, and the patterns of the two are in 
substantial agreement, except that the math- 
ematical curve turns up at the right. 

Milling estimates for the different years 
must be computed by the addition of read- 
ings from curves F and Il\ The tabulation for 1924 will be: 

Ah 

Source Curve reading 

Curve r (Chart 242) . . 464 9 

Curve II' (Chart 243) ... 15 7 

Estimate 480 6 

Since the wheat milled in 1924 actually was 475.0, the residual is 
475.0 — 480.6 = —5.6 for that year. Unexplained variations for the 
other years can be computed in similar fashion, or they may be read 
directly from either of the final approximation charts (Chart 242 or 243). 
The standard error of estimate may now be obtained by squaring each 
deviation {xsi.^'z), summing the squares, dividing by N, and extracting 


WHEAT MILLED X, 
MILLIONS OF BUSHELS 



DOLLARS PER BARREL 

Chart 242. Second Approxi- 
mation to 3N[et Relationship Be- 
tween Price of Flour {Xz) and 
Amount of Wheat Milled [X\). 
(Deviations from curve 31 are 
plotted around curve I.) 
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the square root. (The subscript 2' here indicates an unspecified non-linear 
relationship with X 2 .) 

The coefficient of curvilinear correlation can easily be computed thus: 


Ri 2' 3 


1 - 




■SI 2 3 




546.28 

4,858 


= .8876. 


Ri 2'3 = .942. 

Graphic methods have not been devised for obtaining measures strictly 
analogous to coefficients of partial correlation. Nevertheless, much can 

be learned concerning the relative im- 
portance of the different variables 
simply from inspection of the final 
charts. Roughly, it may be said that 
the importance of the different fac- 
tors varies directly with the vertical 
distance occupied on the charts by 
the different curves. 

Limitations of graphic method. Al- 
though the graphic method is ex- 
tremely flexible, it is also highly 
subjective. Rarely would two statis- 
ticians obtain curves exactly alike 
from the same data. Consequently, 
good results can be obtained only by 
persons of experience and good judg- 
ment. This is in contrast with the 
mathematical procedure based on the 
method of least squares, in which case 
any competent computer can obtain 
only one possible result for a 'given 
equation type. A practical difficulty 
also is inherent in the method when 
a large number of variables are em- 
ployed. The shape of each curve is determined not solely by the appear- 
ance of the individual scatter diagram; in drawing any first approximation 
curve, special importance is accorded to those observations which remain 
constant with respect to the other variables, and, when making successive 
approximations, consideration must be given to the effect which will be • 
had on other scatter diagrams based on the residuals from the chart in 
question. Effective technique for so doing is lacking when there are more 
than three independent variables. It must also be remembered that, as 
more bends are introduced in the estimating curves, additional degrees of 
freedom are lest and the results become less reliable. Nor is it possible tc 



INCOME OF INDUSTRIAL WORKERS Xj 


Chart 243. Income of Industrial 
Workers ( Js) and Deviations of Wheat 
Milled From Curve I' Plotted Arotmd 
Curve II. (No second approximation 
to the net relationship between income 
of industrial workers and variations in 
amount of wheat milled has been 
made.) 
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say exactly how many constants are involved in a freehand curve; conse- 
quently the reliability of the coefficients can be appraised only roughly by 
estimating the number of constants involved. In the particular illustra- 
tion used in this section, additional problems are involved, since the data 
used are a time series. These problems, however, are not peculiar to the 
graphic method, and will be explained in the final chapter of this book. 
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CHAPTER XXV 

CORRELATION OF TIME SERIES AND 
FORECASTING 


Correlation of Time Series 

Preliminary adjustment of data. The technique ordinarily employed 
in the correlation of time series is exactly the same as that of correlating, 
any other kind of data, the sole difference being in the nature of the varia** 
tions compared. Let us take as an illustration the production and price 
of tame hay in the United States, by years, 1900-1936. The data are 
shown in the first three columns of Table 177, and are plotted as time 
series by solid lines in Chart 244, and as a scatter diagram in Chart 245. 
It is apparent from the latter chart that the correlation between the two 
series is negligible. Yet the two sections of Chart 244 indicate that there 
is rather high negative correlation between the short term movements of 
these series. On the other hand, the trends of the two series (shown by 
broken lines) are 'positively correlated. The result is that the positive long 
term relationship almost exactly cancels the negative short term relation- 
ship. 

Since the trends can be compared by comparing their trend equations, 
it seems more fruitful to correlate, not the total movements of the two 
series, but only their short term movements. Sometimes this is done by 
correlating the year-to-year changes (first differences). These annual 
changes are shown in Table 177, and in the scatter diagram Chart 246. 
The coefl&cient of correlation is —.64. It is obvious that a decrease in 
price is associated with an increase in production, while an increase in price 
is associated with ^ decrease in production. This is as we should expect. 
Nevertheless, there are logical objections to the correlation of absolute 
changes. First, relating each value to the preceding year only partially 
adjusts for trend in price or quantity. Second, an increase in production 
from, an abnormally low point would have a different effect on price than 
an increase of the same magnitude from a level which was already above 
normal Finally, an increase in production from a low level, regardless of 
whether it was ahoy© or below normal, would affect price differently than 

790 




Chap. 25 ] COREELATION OF TIME SERIES 791 

would a similar absolute increase from an absolutely high level. This tMrd 
logical difficulty can be overcome by correlating percentages of preceding 
year rather than first differences. Percentages of preceding year are cal- 


TABLE 177 

Production and Price op Tame Hat, Absolute Change, and Per Cent op 
Preceding Year, by Years 1900-1936 


Year 


1900 

1901 

1902 

1903 

1904 

1005 

1006 

1907 

1908 

1909 

1910 
1011 

1912 

1913 

1914 

1915 

1916 

1917 

1918 

1919 

1920 

1921 

1922 

1923 

1924 

1925 

1926 

1927 

1928 

1929 

1930 

1931 

1932 

1933 

1934 

1935 

1936 



c 

(millions 

Price per ton 

of tons) 

(dollars) 

49.8 

9.78 

63.1 

9.88 

59.1 

9.05 

63.6 

9 18 

65.6 

8 82 

66.6 

8 49 

60.4 

10 40 

66.3 

11.60 

71,6 

9.08 

68.8 

10 50 

62.9 

12.16 

62.1 

14.41 

69.1 

11.68 

62.3 

12.36 

65.8 

11.11 

73.3 

10.65 

81.2 

11.18 

71 1 ' 

17.08 

68.5 

20.07 

76 6 

20.15 

76.2 

17.78 

71.0 

12,09 

80.8 

12.55 

75.3 

14.10 

78.9 

13.80 

67.3 

13.95 

67.1 

14.08 

83.3 

11.30 

72.2 

12.22 

76.1 

12,19 

64.0 

12.62 

66.6 

9.03 

71.8 

6.65 

66.5 

8.11 

55.3 

13.95 

78.1 

7.80 

63.3 

11.39 


Change from preceding year 


Production 


33 

6.0 

4.5 
2.0 
1.0 

- 6.2 
59 
5.3 
- 2.8 

- 5.9 
-10 8 

17.0 

- 6.8 

3.5 

7.5 
7.9 

- 10.1 
- 2.6 
8.1 

- 0,4 

- 52 

9.8 

- 5,5 

3.6 

- 11.6 

- 0.2 
16.2 

- 11.1 

3.9 

- 12.1 

2.6 
5.2 

- 5.3 
- 11.2 
22.8 
- 14.8 


Price 


0.10 
- 0.83 
013 
-0 36 
- 0.33 
1.91 
1.20 
- 2.52 
1.42 
1.66 
2.25 
- 2.73 
0.68 
- 1.25 
- 0.46 
0.53 
5.90 
2.99 
0 08 
- 2.37 
- 5.69 
0.46 
1.55 
- 0.30 
0.15 
0.13 
- 2.78 
0.92 
- 0.03 
0.43 
- 3.59 
- 2.38 
1.46 
5.84 
- 6.15 
3.59 


Per cent of preceding year 


Production 


107 
111 

108 
103 
102 

91 

110 

108 

96 

91 

S 3 

133 

90 

106 

111 

111 

88 

96 

112 

99 
93 

114 

93 

105 

85 

100 
124 

87 

105 

84 

104 

108 

93 

83 

141 

81 


Price 


101 

92 

101 

96 

96 

122 

112 

78 

116 

116 

119 

81 

106 

90 

96 

105 

153 

118 

100 
88 
68 

104 

112 

98 

101 
101 

80 

108 

100 

104 

72 

74 

122 

172 

56 

146 


Produefon, 1900-1923 XJ-ted States 

uSed s{ateTDe|?tm^ 

SteSfbeKentVA^-lS mf. P 534. 193^-1935 Aor^V.ral .ta. 

19S7, p. 226 
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culated in the last two columns of Table 177, and Chart 247 is a scatter 
diagram of the results. The correlation coefficient is .66. The picture 
is not greatly diffeient, except that the relationship now is perhaps non- 
linear. 

In order to make a more logical comparison, we must correlate, not 
percentages of the preceding year, but percentages of normal. In the 
present instance it seems best to make still another adjustment in the 
price series. Besides the price trend that is characteristic of hay in par- 
ticular, we have changes in the series which are associated with changes 


MILLIONS 
OF TONS 



DOLLARS 
PER TOM 



Chart 244* Production and Price of Tame Hay in the United States> and. Sectdar 
Trends, by Years, X900-193<S* (Bata of Table 177,) 
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. ^ 

in commodity prices in general. To adjust for both of these factors, we 
first divide the price of hay by an index of commodity prices. Although 
it is difficult to say what index is most appropriate, the United States 
Bureau of Labor Statistics Index of Wholesale Prices has been used. The 
adjusted data may now be referred to as expressing changes in the real 
price of tame hay. The second step is to compute a trend for the real 
price series. The progress to this point is shown in Chart 248. The trend 
is a third degree curve. The final step is to divide the real price series by 

PRICE 

DOLLARS PER TON 



PRODUCTION 
MILLIONS OF TONS 

Chart 245. Scatter Diagram of Production and Price of Tame Hay, by Years, 1900- 

1936. (Data of Table 177.) 

the trend values. These three steps are shown in Table 178, columns 5-9. 
It would be entirely logical to adjust hay production for changes in the 
number of animal units, and not entirely unreasonable to adjust the hay 
production figures for changes in general production in the United States, 
or in production of commodities which are substitutes for hay. 

For purposes of this illustration, however, no such adjustment has been 
made. A second degree curwe has been fitted directly to the data, and 
they have then been divided by the trend values. The numerical results 
are given in Table 178, columns 2-4. A comparison of the two adjusted 
time series is afforded by Chart 249, and by the scatter diagram Chart 
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250. The correlation coefficient is r = -.79, which is materially higher 
than that obtained by the other two methods. 

Another method of dealing with the trend factor when correlating time 

TABLE 17S 

Adjustment of Pboduction and Price of Tame Hat Data for Purposes of 
Correlation, 1900-1936 



Production 

Price 

Year 

X 

Yc 

Per cent 
of trend 

Nominal 

Price 

index 

Real price 

Y 

Ya 

Per cent 
of trend 



[X Xcj 

price 

[1926 « 100] 

[Col 5 ~ Col 6] 

(8) 

lY - Yc] 

(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(9) 

1900 

49 8 

53.8 

92 

9.78 

56.1 

17.43 

15.78 

110 

1901 

53.1 

55.6 

96 

9.88 

55.3 

17.87 

16.09 

111 

1902 

59.1 

57.3 

103 

9.05 

58.9 

15.37 

16 33 

94 

1903 

63 6 

58 8 

108 

9.18 

59.6 

15.40 

16.52 

93 

1904 

65.6 

60 4 

109 

8.82 

59.7 

14,77 

16.65 

89 

1905 

66.6 

618 

108 

8.49 

60.1 

14.13 

16.73 

84 

1906 

60.4 

63.1 

96 

10.40 

618 

16.83 

16 76 

100 

1907 

66.3 

64 4 

103 

11.60 

65.2 

17.79 

16.75 

106 

1908 

71.6 

65 6 

109 

9.08 

62.9 

14.44 

16.63 

87 

1909 

68.8 

66.6 

103 

10.50 

67.6 

15.53 

16.60 

94 

1910 

62.9 

67.6 

93 

12.16 

70.4 

17.27 

16.48 

105 

1911 

52.1 

686 

76 

14.41 

64.9 

22.20 

16.32 

136 

1912 

69.1 

69.4 

100 

11.68 

60.1 

16.90 

16.14 

105 

1913 

62.3 

70.2 

89 

12.36 

63.8 

17.71 1 

15.94 

111 

1914 

65.8 

70.9 

93 

ILll 

: 68.1 

16.31 

15.71 

104 

1915 

73.3 

71.5 

102 

10.65 

69.5 

15.32 

15.48 

99 

1916 

81.2 

72.0 

113 

11.18 

85.5 

13.08 

15.23 

86 

1917 

71.1 

72 4 

98 

17.08 

117,5 

14.54 

14.97 

97 

1918 

68.5 

72.8 

94 

20.07 

131.3 

15.29 

14.72 

104 

1919 

76.6 

73.1 

105 

20.15 

138.6 

14.54 

14.46 

101 

1920 

76.2 

73.3 

104 

17.78 

154.4 

n.52 

14.20 

81 

1921 

71.0 

73.4 

97 

12.09 

97.6 

12.39 

13.95 

89 

1922 

80.8 

.73.4 

110 

12.55 

96.7 

12.98 

13.72 

95 

1923 

75.3 

73.4 

103 

14.10 

100.6 

14.02 

13.50 

104 

1924 

78.9 

73.2 

108 

13.80 

98.1 

14.07 

13.29 

106 

1925 

67.3 

73.0 

92 

13.95 

103.5 

13.48 

13.11 

103 

1926 

67.1 

72.7 

92 

14.08 

100.0 

14.08 

12.96 

109 

1927 

83.3 

72.4 

115 

11.30 

95.4 

11.84 

12,83 

92 

1928 

72.2 

71.9 

100 

12.22 

96.7 

12.64 

12.74 

99 

1929 

76.1 

714 

107 

12.19 

95.3 

12.79 

12.69 

101 

1930 

64.0 

70.7 

91 

12.62 

86.4 

14.61 

12.68 

115 

1931 

66.6 

70.0 

95 

9,03 

73.0 

12.37 

12.71 

97 

1932 

71.8 

69.2 

104 

6,65 

64.8 

10.26 

12,79 

80 

1933 

66.5 

68.4 

97 

8.11 

65.9 

12.31 

12.92 

95 

1934 

55.3 

67.4 

82 

13.95 

74.0 

18.62 

13.11 

142 

1935 

78.1 

66.4 

118 

7,80 

80.0 

9.75 

13.36 

73 

1936 

63.3 

65.3 

97 

11.39 

80.8 

1 14.10 

13.67 

103 


Source: See Table 177 Price Index is United States Bureau of Labor Statistics Index of "WLolesale 
Pncea, published in United States Bureau of Labor Statistics, Wholesale Prices 1081, p 14, and Wholesale 
Prices, December and Year 19S7, p 3 
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series is to introduce a third variable, time, and employ multiple correla- 
tion analysis. This is done instead of eliminating the trend, and has the 
added advantage (when dealing with problems like the one under discus- 
sion) of showing the net annual change in price which is a result of changing 
demand. 

The above illustration employed annual data. Had monthly data been 
used, it would have been desirable to have deseasonalized the data also, 
since the presence of violent seasonal movements might have distorted 

CHANGE 
IN PRICE 



CHANGE IN PRODUCTION 

Chart 246. Scatter Diagram of Change from Preceding Year of Production and 
Price of Tame Hay, 1900-1936. (Although there are 36 observations, only 35 dots are 
visible since two observations coincide. Data of Table 177.) 

the correlation of the cyclical or other short term movements being com- 
pared. This procedure will be treated later in this chapter. 

Correlation of adjusted cyclical relatives. A comparison of the cycles 
of production and real price of tame hay expressed as percentages of normal, 
as shown in Chart 249, may readily be made. However, the graphic com- 
parison of two series that differ greatly in amplitude is difficult. Thus, 
although we can tell by inspection of section A of Chart 251 that the turn- 
ing points of passenger car production and electric power production are 
the same, the two curves are at times so far apart that it is difficult to 
judge how closely they are associated throughout* Mathematically, of 
course, the closeness of relationship may be ascertained by computing the 
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coefficient of correlation by the customary product-moment formula, as 
in Table 179, part A. 

For graphic visualization, however, it is helpful to make two further 
adjustments in the data before plotting. These adjustments, which are 
embodied in section B of Chart 251, are as follows. First, the data are 
converted into percentage deviations from normal by subtracting 100 (or 
the actual mean) from each per cent of normal. If the trend has been 
fitted to the data by the method of least squares for a period exactly ‘coin-' 

PRICE 

PER CENT OF 

PRECEDING YEAR 



PRODUCTION 

PER CENT OF PRECEDING YEAR 

Chart 247. Scatter Diagram of Per Cent of Preceding Year of Production and Price 
of Tame Hay, 1900-1936. (Data of Table 177.) 

ciding with the period under comparison, the mean of the series will be 
approximately 100 per cent. If the trend covers a longer period than do 
the data being correlated, or if the trend has been extended, or if some 
method other than least squares has been used (as in the present instance), 
it will generally be advisable to use deviations from the actual mean rather 
than from 100 per cent (in order that Sa; = 0 and St/ = 0), if the method 
of correlation about to be explained is to be used. In the present instance 
deviations are taken from the mean. Second, each series is expressed in 
units of its standard deviation. (Sometimes the average deviation is used 
instead.) Thus the standard deviation of electric power production has 
been computed as in Table 179, part B, and the deviation for each year 
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TABLE 179 

COBRELAaiON OF CYCLICAL MOVEMENTS OP ElECTRIC PowBR PRODUCTION AND PaSSENGEB CaB PbODUCTION, 1921- 1932, BY pRODUCl^i 
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12(112,284 52) - (1,191 29) (1,118 28) 

■\/fl2(118,606 85) - (1,191 29)“*] [12(111,263 29j - (1.118 28)^] 
+.816. 
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Source: TJnited States Department of Commerce, Svrvey of Current Business, 1936 Supplement and subsequent issues The trend fitted to electric power production is 
not that compiled in Chapter XV but is a high-low mid-pomt trend. This type of trend was used also for passenger car production, and is the same as that given in 
Table 92. 
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has been divided by this value (5.343). A similar procedure has been fol- 
lowed for passenger car production; the standard deviation for this series 
is much larger (24.24). When each series has been divided by its own 
standard deviation, the two resulting series show the same degree of fluc- 
tuation. It is now much easier to compare the two series graphically. 
The degree of conformance is seen in part B of Chart 251 to be high. Now, 
r is most easily computed by the expression 



This formula was given on page 666, note 13. (If the data have been con- 
v^erted into average deviation units instead of standard deviation units 

PEfe 
140 

120 

fOO 

80 


1900 1904 1908 1912 1916 1920 1924 1928 1932 1936 

Chart 249. Production and Real Price of Tame Hay, as Percentages of Trend, 1900^ 
1936. (Data of Table 178.) 

the formula on p. 804, note 1, should be used.) Making our substitutiom 
from part B of Table 179, we find 

r = ^ (-f 9.792) = +.816. 

Any of the formulae for r with which the reader is now familiar could be 
used instead of this one, but the above is easiest if each series has already 
been adjusted for amplitude by dividing through by its standard devia- 
tion for purposes of graphic comparison. Of course, it will not always, or 
even usually, be found advisable to express the data in deviation form 
and to adjust the series for differences in amplitude by dividing each 
series by its respective standard deviation. It is much more laborious to 
make the adjustments and then correlate, than it is to correlate by the 
customary method shown in part A of Table 179. Incidentally, this table 
shows that the same value of r is obtained when correlating the data in 
terms of cr, as when per cent of trend values are correlated. 

An interesting by-product is available when this method of correlation 
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is used. It was explained on page 666, note 13, that r could be thought 
of as the slope of the estimating line lyx when each series has been ex- 
pressed in terms of its otsui standard deviation, that is, hcyCx' This pro- 
cedure has literally been followed in the present instance, and in Chart 252 

REAL PRICE 

PERCENTOF NORMAL 



PER CENT OF NORMAL 

Chart 250. Scatter Diagram of Production and Real Price of Tame Hay as Per- 
C'entages of Trend, 1900-1936. (Although there are 37 observations, only 36 dots are 
visible since two observations coincide. Data of Table 178.) 

the adjusted data have been plotted and the line ^ = .816 ^ has been 

O'y (T X 

shown. This chart gives visual evidence of the correctness of the above 
way of looking at r. 



PER CENT 



STANDARD 

DEVIATIONS 



Chart 251- Passenger Car Production and Electric Power Production (A) as Per- 
centage Beviations from Trend and (B) as Deviations from Trend in Units of Then 
Standard Deviations, 1921-1932. (Data of Table 179.) 
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Pro blems in c orrelating time serie s. It must i)_e evident tha t 
of the correlation coefficient js jaffecJ^edJ^Y^tl^^ t}^e jof_tren4.fitte.^^^^ 
d^a^nd the. period jbo, which it is ffitted. If a period of lO^earjsis heing 
correL^tedjjt TO not be IogicaI-t.CLUs a-£Qr one series a section of a t rend 
fitted^ over a iOO-yeajjperiod^d theunthena^trend fitt ed to data ex > 

tending oyer 10 years only . _ Th e for mer tr end wou ld, in all likelihood . 

PASSENGER CAR 
PRODUCTION 



-3-2-10 1 2 3 

ELECTRIC POWER PRODUCTION 
UNITS: <j^ 


Chart 252. Scatter Diagram of Passenger Car Production and Electric Power Pro- 
duction as Percentage Deviations from Trend in Units of Their Standard Deviations. 
(Data of Table 179B.) 

f aH to pa ss thro ugh th e wproximat e center of e ach cycle, and might not 

ev e^ t ouch some of th e cycles.^ Con s eauentlY the cor r Nat ion 

mi ^ht understate or overstate the d egree of relaj-irinship T^gf.ween thg sgriea. 

be apparent tha t the use of mi inflexible trend for one serie s 

a ad a fl exiblelre^F ofThe other w^uld.4«!QdH£& .similar. rgspl ts, If we , 

w ish to~"cQrrelate cyclical movements, it seems best t^ierefore to nso a 
trend that pes approximately tlirb ugLjM.gsat£I-Q f-each cycle. It may 
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bejfhat no ‘dimple mathematical curve vdll be satisfactory and that some 
relatively subjective method^_^icl^as. the high-low mid-point method^^wifl 
h ave t o be reported to, at ]ea«t a« a fir«t ppproximati^. 

Another prdbldin to' cou-ider i- Vvheri:< r tIk' Pcarsonian method of cor- 
relation, based on the second moments, is appropriate for correlating time 
series. The fluctuations of a time series are not usually distributed norm- 
ally around the trend line. There are often a few extreme deviations, 
which, when squared, largely determine the value of r. In our last illus- 
tration, the distribution is fairly regular, although 1929 is rather high and 
1932 is exceptionally low. With this problem in mind, some authorities 
suggest the use of the rank method when the extreme deviations are par- 
ticularly large. Another solution is the use of a formula based on first 
moments, rather than second.^ In view of the fact that interest frequently 
centers in whether two series are moving in the same general direction 
(positive or negative) at the same time, without regard to the magnitude 
either of their level or their change, it may be that none of the orthodox 
methods of correlation are satisfactory. 

A further difficulty in correlating, time series is that we have no Iq^cal 
basis^ for estimating the reliability pf the coefiicient of correlation.^ TEe 
chief obfectioh to the use of any reliability test for r for time series is, that" 
the different obse^atjons are not random!^' distribute_3^ach observation 
in' a tim e ser ies is related^^^yal\Le§,iti'tot^eries,for^rec,qding and^.ubse- 
quSTVom^ TSae . F urthemip re^ we_cann ot: g en eralize coimerning the 
exacT nature of this interrdatim^hip, and bonce we cannot develop any 
general theory of reliability a pp licable ty thds teanch of statislhj^^ Per- 
haps" this difflciEty^ydlL more obymus when we as k hpw rnany 

^ See “The Validity of Correlation in Time Sequences and a New Coefficient of 
Similarity,’' by O. Gressens and E. R. Mouzon, Jr , Journal of the American Siaiisticol 
Associaiton, Vol. XXII, December 1927, pp. 483-492. This method is further eluci- 
dated and its relation to r explained by George R. Davies, in an article entitled “First 
Moment Correlation,” appearing in the Journal of the American Statistical Assodationf 
Vol. XXV, December 1930, pp. 413-427. The formula is 

^ Ss(2V ~ 2|5|) 

02 = 775 J 


where s refers to the smaller of each pair of items when each series is expressed as 

deviations from the mean in terms of average deviations ( and ^ • When 

summ in g algebraically, s is positive if the signs of the paired deviations are alike, and 
negative if they are unlike. Using the x and y data of Table 179, part B, we find that 


Cz 


7.70(24 - 8.68) 
144 


+.819. 


The computational labor is much less than that involved in using the formula on p, 800 
Davies also explains certain short cuts in computation. 
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independent observations are contained i^he cyclicaLjrelatives. used 
the last illustration. Although there are 12 years, there are not 12 degrees 
of freedom. There are only three complete cycles (measuring from trough 
to trough). Are there then only 3 independent observations'^ (Subtract- 
ing 2 more for the constants a and b in the estimating line leaves only 1 
degree of freedom.) But there are more than 3, since each observation 
in a cycle is not completely dependent on the preceding values. If we 
now had monthly data, would we have 144 independent observations for 
the 12 years? Of course not. But, how many we would have it is impos- 
sible to say. 


Measurement of Lag 

It is an aid to the understanding of economic processes to measure the 
period of time by which one series precedes another. For a business man 
it is especially profitable to be able to predict, a number of months in 
advance, when business will pick up or recede. Since the turning points 
in all time series do not occur at the same time, it is necessary to pick out 
a series which precedes, with some degree of regularity in its turning points, 
the series we wish to predict, and observe how many months^ interval 
there is between the turning points of the two series. In order to do this 
precisely, the device of correlation is frequently resorted to. The two 
series having been reduced to per cent of normal, they are plotted on sep- 
arate sheets of graph paper, with scales so chosen that the amplitudes of 
fluctuation will be about the same. These sheets are then placed together 
and held up to a light, and when they are slid back and forth, some point 
is reached at which the correspondence is closest. 

In the accompanying illustration, the Index of Industrial Production of 
the Board of Governors of the Federal Reserve System is to be forecast 
by an index of production of durable consumers goods constructed by the 
writers. Both series have been adjusted for trend and seasonal variation, 
and the former has been slightly smoothed. Chart 253 shows the two 
series superimposed: (A) with no lag of either series; and (B) with the 
series to be forecast moved two months to the left, so that January 1919 
of the durable consumers goods series is even with March 1919 of the 
Federal Reserve index. This last position seems to show the best corre- 
spondence between the two series; it is the best visual estimate of the lag. 
In other words, increases in durable consumers goods production seem to 
precede increases in general industrial production by about two months. 
With the 2-month lag of industrial production, r — +.933. Computa- 
tions are shown in Table 180. 

It may be, however, that better correlation would have been obtained 
with some other period of lag. Logically it may even be hypothecated 
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that production of durable goods for consumers waits on purchasing power 
derived from a general industrial pickup. The various values for r with 
different lag assumptions are as follows*. 


If production of durable consumers goods lags behind industrial production: 

r 


10 months 
9 months 
8 months 
7 months 
6 months 
5 months 
4 months 
3 months 
2 months 
1 month 
No lag 


+.729 
+.743 
+.764 
+.778 
+.794 
+ .818 
+ .833 
+ .857 
+ .887 
+.908 
+ .924 


If industrial production lags behind production of durable consumers goods: 


1 month. 

2 months. 

3 months 

4 months 

5 months 

6 months 

7 months 

8 months 

9 months 

10 months 

11 months 

12 months 

13 months 

14 months. 


r 

+.932 
+.933 
+.929 
+.923 
+.917 
+.902 
+.899 
+.893 
+.870 
+.850 
+ .820 
+.785 
+.753 
+.719 


The results are shown graphically in Chart 254. It may be concluded 
that production of durable consumers goods forecasts changes in industrial 
production by about two months, and that industrial production is a less 
satisfactory forecaster of durable consumers goods production. 

It would be useless to compute Ct for these data, and worse than useless 
to make the Z transformation and attempt to evaluate the significance of 
the difference between these r values. As has already been explained, the 
interdependence of the different time series observations invalidates the 
usual procedures for judging the reliability of the correlation coefficient. 
Nor is it clear how many degrees of freedom are sacrificed when such a lag 
adjustment is made.^ Just as a random sample from an uncorrelated 


2 The authors obtained a correlation of -.84 from random data of 27 items adjusted 
for a 3-perJod lag See Croxton and Cowden, Practical Business Statistics' pp. 457-460, 
Prentice-Hall, Inc., New York, 1934. 
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1 

population is likely to show some correlation, so any sample which k 
uncorrelated when taken s3mchronously will show correlation when ad* 
justed for lag. It is better, therefore, not to attempt a mathematical 
estimate of the reliability of r. It is advisable, however, to include a 
considerable period of time in the series, and to compute r for various 


f»RABI.C 
COHSUMtRS GOODS 




Chart 253. Cyclical Movements of Durable Consumers Goods Production Index 
and Federal Reserve Index of Industrial Production, 1919-1936: A. Synchronous, B. 
With Industrial Production Moved 2 Months to the Left. (The time scale refers to 
durable consumers goods. Data of Table 180 ) 


sub-periods of time before coming to anything but the most tentative 
conclusion. However, even if this procedure gives the statistician confi- 
dence that he has discovered a real relationship, it does not necessarily 
follow that the equation wiU be useful in forecasting the future. Each 
trend, deviations from which are correlated, must be capable of being ex- 
trapolated without serious error. Also, economic institutions and condi 
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tions are constantly changing, and what may ha^e been an important 
relationship in the past may be either of a different nature or of smaller 
relative importance in the future. 

The equation for estimating industrial production, F, from durable con' 
sumers goods production, Z, may be obtained by the formula 

(Vc - F) = (Z - X). 

O'® 

TABLE 180 

Correlation op Federal Reserve Index of Industrial Production and Durable 
Consumers Goods Production with 2-Month Lag op Industrial Production, 

1919-1936 


Year and month 
(for durable 
consumers goods)* 

Durable 

consumers 

goods 

production 

X 

Industrial 

production 

Y 

XY 


ys 

1919: 

January 

80 

96 

7,680 

6,400 

9,216 

February 

97 

98 

9,506 

9,409 

9,604 

March . 

107 

98 

10,486 

11,449 

9,604 

April . ... 

119 

104 

12,376 

14,161 

10,810 

May 

133 

109 

14,497 

17,689 

11,881 

June. . . 

153 

no 

16,830 

23,409 

12,100 

July... . . .. 

157 

107 

16,799 

24,649 

11,449 

August . . . . 

156 

106 

16,536 

24,336 

11,236 

September 

156 

105 

16,380 

24,336 

11,025 

October 

154 

105 

16,170 

23,716 

11,025 

November 

149 

116 1 

17,284 

22,201 

13,456 

December 

135 

116 

15,660 ’ 

18,225 

13,456 


1936: 

January 

60 

71 

4,260 

3,600 

5,041 

February 

60 

76 

4,560 

3,600 

5,776 

March 

60 

77 

4,620 

3,600 

5,929 

April 

66 

79 

5,214 

4,356 

6,241 

May. . 

70 

82 

5,740 

4,900 

6,724 

Jime ... 

74 

82 

6,068 

5,476 

6,724 

July 

78 

83 

6,474 

6,084 

6,889 

August 

78 

83 

6,474 

6,084 

6,889 

September 

78 

86 

6,708 

6,084 

7,396 

October 

73 

91 

6,643 

5,329 ’ 

8,281 

Total 

17,612 j 

18,984 

1,692,884 

1,703,842 

1,761,028 


* Dwabl© consimers goods production (X) is from January 1919 tTirougii October 1936 Industrial 
production (F) is from March 1919 through December 1936 See p 805 for explanation 

Source* Durable Consumers Goods Production Index was computed by the authors The Federal Re 
serve Industrial Production Index (Manufacturing and Minerals) was taken from Standard Trade an» 
Secvrtttes» StaUstves, Vol 3, p D-4: and Current Stahetice, May 1937 
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Substituting in this equation gives the equation 
Yc - 46.485 + .51307X. 

This equation may be used for making forecasts of cyclical movements 
for individual months. Thus, since the index number for durable consumers 
goods production stood at 80 for February 1937, our best estimate foi 
industrial production for April 1937 is 

Yc -= 46.485 + .51307(80) = 87.53. 
r 



Chart 254. Values of r for Durable Consumers Goods Production and Industrial Pro* 
duction with Different Assximptions of Lag. (For data see p 806.) 

The actual index number for April 1937 was 88. Such a close agreement 
is unusual, however. The standard error of estimate 

(Ty^ = G’y'Vl — 7^ = 6.83. 

This indicates that, if estimates had been made from the equation for 
each month inchided in the period covered by the data, about two-thirds 
of the estimates would have been in error by less than 6.83. It must be 
remembered, however, that we are not dealing with variables which follow 
a chance distribution, and therefore (Ty^ should be thought of as only a 
very rough estimate of the range within which 68 ner cent of the actual 
Y values fall. 
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The slowness with which economic data are reported and the scarcity 
of time series on a basis shorter than a month are factors that impair the 
usefulness of this method. It is quite possible that weekly, daily, or hourly 
data might bring to light relationships which are known and utilized only 
by a few ‘insiders.” The theorist argues that all economic processes are 
interrelated. It does not seem logical that the cause-and-effect relation- 
ships which supposedly surround us on every side must always take a 
month or more for their development. There must be many that work 
out in a few days, a few hours, or nearly instantaneously. If the market 
hears that a new industrial use has suddenly been announced for copper, 
it does not wait weeks or even hours to show its reaction in a price change. 
As data are made available upon a weekly, daily, or more frequent basis, 
it is conceivable that very useful lags and leads may be obtained. 

The use of correlation procedure in forecasting is subject to all the ob- 
jections previously raised to its use in correlating time series, and others 
as well. These objections are: 

(1) Being based on the second moments, r gives imdue influence to the 
occasional extreme deviations characteristic of time series. In fact, some 
statisticians insist that a person’s visual impression of lag is a more satis- 
factory measure than is r. 

(2) The lag may be different at recession than it is at revival. As 
was mentioned in Chapter XIX, the National Bureau of Economic Re- 
search computes average lag or lead at revival and at recession with respect 
to its reference cycle. 

(3) Interest often centers mainly on turning points, while r gives equal 
importance to lags at all phases of the cycle. It may be profitable to be 
able to foretell merely when to expect a change in direction, even though 
the amount of change cannot be forecast. 

(4) It is a laborious process to compute r for a large number of lag 
hypotheses. 

(5) In addition to criticisms of the coefficient of correlation as a measure 
of relationship, one may also criticize the nature of the variations corre- 
lated, arguing that a person can more accurately predict the future with 
respect to the present than he can with respect to some normal, which is 
often difficult to estimate correctly. 

Distribution of lag. A refinement o£ the usual lag measurement has 
been introduced by Irving Fisher. It is bis contention that the business 
cycle is largely a “dance o£ the dollar.” But although price changes are a 
dominating factor in changes in the volume of trade and employment, a 
given price change does not have its entire effect in any one month 
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Rather, the effect is distributed over a number of months, reaching a climax 
after a certain period of time and then d\vindling in importance, after the 
fashion of the ordinates of a frequency distribution. The type of distri- 
bution which Fisher considers most logical is one which becomes normal 
if the logarithm of time is taken as the abscissa, the origin being the month 
whose price change is under consideration. Furthermore, it is not the 
price level that is considered the causal factor, but the rate of price change. 
Thus the percentage price change for June would be approximately 

(July Price — May Price) -t- 2 
June Price 

The mechanics of deternaining the best constants for the logarithmic fre- 
quency distribution (the mode and the standard deviation) involve a great 
deal of labor, rhe major part of which can be saved by simplifying the 
hypothesis slightly. If the maximum effect of a given price change is 
assumed to be the month (or other unit of time) following its occurrence, 
and the decline in influence is assumed to be linear, the statistical problem 
consists in determining only for how long a period of time the given cause 
exercises any effect whatever This simplification of the problem does 
not usually affect materially the accuracy of the results. Following is a 
brief description of a method devised by Fisher for shooting forward and 
cumulating the effect of price changes. 

Determine the best fixed lag by the usual correlation method. Upon 
the hypothesis that the duration N of the influence of price change is 
three times the fixed lag period, compute an estimating index by methods 
which will be explained shortly. Correlate the estimating index with the 
actual data of the series being estimated. Repeat this process with N equal 
to four times the fixed lag period. Try any such third hypothesis as 
seems best, and continue until the highest correlation is obtaiaed. 

A procedure for the actual computing of the forecasting index for any 
given h3pothesis of lag is quoted from an article by Irviag Fisher.^ In 
order to preserve consistency with our other symbols, X is taken as the 
causal factor (price change), rather than a as used by Fisher. Also, t refers 

® See “Note on a Short-Cut Method for Calculating Distributed Lags^’ by Irving 
Fisher in Extrait du Bulletin De LHnstitut International de Statistique, XXIX: 3. In 
this article Fisher describes further short-cuts in computation, and refers the reader 
for further discussion of principles to Max Sasuly, Trend Analysis of Statistics, Brooldiig? 
Institution, 1934, pp, 13^135, 145, 149, 201, and Chapter X. Other articles on this 
subject by Irving Fisher are “Our Unstable Dollar and the So-called Business Cycle,^' 
Journal of The American StaUstical Assodation, June 1925, Vol. XX, pp. 179“202, and 
“Changes in the Wholesale Price Index In Relation to Factory Employment,” Journal 
of the American Statistical Association, September 1936, Vol. 31, pp. 495-506. This 
includes a discussion by Morris A. Copelaad and a rejoinder by Irving Fisher. 
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to a given point of time. Thus the first month is hj the second roonth t2. 
and so on. Quoting from Fisher (with minor alterations in wording) : 

Let us suppose that the single *‘cause^^ Xi, at month h, has its effect distributed over 
succeeding months (tz tz, etc ) in diminishing degrees proportional to the numbers 
6, 5, 4, 3, 2, 1, ceasing after the sixth month (that is, after tr). While the total effect 
of Xi is proportional to Xi itself, the part of this total which is felt at time tz is 

6 + 5+ 4+ 34-2 + 1°^ Xi, or ^ofXr, 


and the parts in the five succeeding months are respectively 



ly 






Similarly the cause’’ Xzj at month tz, produces its effect in the succeeding months 


tz, ' tz in these same proportions, 6, t 

i, 4, 3, 

2, 1; and so 

on 

indefinitely, as indi- 

cated below: 









tz 

^4 tz 

k 

t7 

^8 

k 

ti0‘ 

Xi* 6 

5 

4 3 

2 

1 

- 

- 

- 

X2. 

6 

5 4 

3 

2 

1 

- 

- 


Xa 

6 5 

4 

3 

2 

1 




X4: 6 

5 

4 

3 

2 

1. 



Xb: 

6 

5 

4 

3 

2. 




Xe: 

6 

5 

4 

3. 





X7: 

6 

5 

4. 






Xs: 

6 

5. 







6. 


What, then, is the combined effect, at time t'j, of all the previous X’s^ The effect 
of Xi at time t 7 is Xi, as already noted, and as indicated in the above schedule by 
the figure 1 under ^7. Similarly, the effect of X2 at this same time ^7 is X2; that of 
X3 is ^ X3; and so on. 

The total of all these effects combined, at time ^7, is 

Y _ Xi 4 * 2X2 Hh 3X3 -}“ 4X4 4 " 5 Xs 4 " 6X3 
7 r+2'4‘3 + 4 + 5 + 6 ' 

Note that tins total combined effect, at time t^, of the preceding Xi, X2, X3, X4, X&, Xg, 
is called ^7 and is an average of said preceding X’s 
Frequently we need not take the trouble to apply the common divisor 21 ( = 1 4- 2 -b 
L! ’ ”5“ since we are essentially concerned only with the relative magmtudes of the 
X’s. We may then prefer to compute merely' the numerator of the above fraction 
which, in contrast to X7, may be called SX7; and SX7 = 2IX7, SXs = 21^8, etc 
More generally SXjv + 1 « IXi + 2X2 4- SXa 4- * * • 4- NX^ 

Thus if, in the numerical example above, the values of Xi, X2, X3, X4, Xs, Xe are 
respectively 2, 1, 3, 2, 4, 6, then DX7 =- 1(2) 4- 2(1) + 3(3) 4" 4(2) 4- 5(4) 4- 6(6) 
2^24"^ + 84"20 4‘36 = 77 Since X = 6, the value of S 7 , the average of the 
X^B corresponding to X7, is X7 »= ~ 3.67 In the same way SXg, SX9, SXio, etc., 

may be calculated. 
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The weighted moving total values, SX7, SXs, etc. (or X7, Xs, etc.)j 
are, of course, the values of the estimating index; that is, they are t-he 
values of the derived X series which are to be correlated with the Y series, 
SX7 being paired with the 7th Y value, SXs being paired with the 8th Y 
value, and so on. Although still further short cuts are available, we shall 
not undertake to describe them here. 

The reader should realize that the adjustment for the distribution of 
the lag further reduces the effective number of observations in the derived 
X series, and the number of degrees of freedom available for measuring cTr 
(if such a measure is at all legitimate) is correspondingly reduced. The 
likelihood of obtaining correlation between the Y series and the derived X 
series when there is no real relationship is even greater than when the ad- 
justment is made for lag without distributing the lag. Consequently the 
statistician should be very cautious about drawing conclusions as a result 
of applying this technique; he should insist upon a strong theoretical argu- 
ment as well as an abundance of statistical evidence. 

Methods of Forecastmg 

In earlier correlation chapters we have been accustomed to making 
estimates of the value of one variable from our knowledge of the value of 
another variable (or variables) and our knowledge concerning the func- 
tional relationship between (or among) these variables. These estimates 
involve the inference that the relationship which has been inferred from 
the sample is the one that really exists in the population. When corre- 
lating time series, however, it is not usually correct to think of the corre- 
lated data as constituting a sample from the parent population. Economic 
relationships are man made, and they change with changes in environment. 
A forecast may gradually lose its efficacy with the passage of time. Fore- 
casting involves another difficulty also. There is an interval of time be- 
tween the ^^cause^^ X and the '^effect^^ F. In that interval of time other 
causes, unknown at the time of the forecast, may intervene, so that the 
effect may be quite different from that which was originally anticipated. 
Since this is true, complete reliance cannot be placed on any procedure, 
statistical or otherwise, for economic forecasting. 

Nevertheless, it is the function of a science to make predictions, and, 
as a practical matter, ah rational persons do make forecasts whenever they 
make any commitment concerning the future. Therefore, any clue that 
will help us to guess right concerning the future significantly more often 
than we guess wrong is worth noting. In the following sections we shaU 
summarize some of the methods in vogue at the present time. 

Economic rhythm method. In earlier chapters, methods were explained 
for measuring trend, obtaining periodic patterns, and isolating cycles. In 
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order to forecast future movements of any series that has been analyzed 
into these elements, it is necessary to go through the following steps in 
the order indicated: 

(1) Project the trend a number of years into the future. This may 
be done either freehand or by means of the trend equation. (As mentioned 
earlier, trend projection is fraught with danger.) 

(2) Superimpose on this trend, for a period of perhaps a year or two, 
an estimate of future cyclical movements, being guided by the past cyclical 
behavior of this series. 

(3) Multiply these estimated monthly cyclical trend values by the ap- 
propriate seasonal index numbers. 

The heart of this method is step 2. The trend line is supposed to rep- 
resent the normal growth (or decline) of the series. Then, assuming that 
history repeats itself — ^that cycles of approximately the same amplitude 
and duration tend to recur — ^it is a simple procedure to extend the cyclical 
design of recent years into the future. This can be done mathematically 
if desired. Experience shows, however, that cycles in most series are not 
periodic and that the mathematical extrapolation of cycles is not satis- 
factory. 

In this method, as applied by Roger Babson, areas above and below 
the normal (or X-Y) line enclosed by the cyclical curve are noted.^ See 
Chart 255. The X-Y line is constructed by computing the average index 
number of each complete cycle, placing the figure so obtained at the mid- 
point in terms of the span of the cycle, and drawing a line through such 
mid-points for the successive cycles. For uncompleted cycles the line is 
tentatively extended by inspection. If the area above the line exceeds 
that below the line, the forecast is depression. As the area below the line 
approaches in area that above the line, the end of the depression becomes 
near. However, the depression area cannot be used to forecast the size 
of the prosperity area. Areas are even carried along from one portion of 
a cycle to another before being equalized (see areas G+ and ); but this 
makes the practical application of the method very difficult. Another 
difficulty is that an area can be produced by an increase in either ampli- 
tude or duration, whereas the business man is. primarily interested in pre- 
dicting the turning points of a cycle. Finally, there is the objection that 
the equalization is not accurately accomplished until the completion of 
each cycle permits drawing in the final trend line. 

Although this action-and-reaction concept is borrowed from physics, the 


^ See Bahson^s BeporiSf Special Letter, March 27, 1928; also an undated bulletin 
'‘Technical Description of the Babsonchart.” Additional information was provided tc 
the authors by the Eesearch Department of Bahsonh Reports, 
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economic reasoning is that American business has a steady and persistent 
growth, and that cycles are self-generating. The method is applicable 
also to the forecasting of occurrences other than business cycles- For 
instance, there is the behef held by some authorities that minor movements 
in stock prices tend to last a certain length of time, or until a certain 
number of shares have been sold. 

The Dow system is for forecasting swings in stock prices of shorter 
duration than the business cycle. The system is based upon a theory of 
resistance levels. If the market has moved ^^sidewise^^ within a narrow 
price range for several weeks, it means that speculative interests have 
been either accumulating or distributing stocks. Then, when resistance 


PER CEMT 



Chart 255. Babson Chart of Physical Volume of Production in the United States 
(Traced from chart appearing in the January 2, 1939, issue of Babson^ s Reports and 
published with the Babson Organization’s permission. This chart shows a revised 
trend, or X-7 line. In previous charts the trend was extended from the termination 
of the prolonged depression area which, according to Babson, lasted from the middle 
of 1930 through most of 1936. This extension, which was shown as a dotted line, 
sloped upward to the right through 1937 and 1938. The present trend is horizontal 
from 1933 to date, as is the extension. The effect is to enlarge the 1936-1937 prosperity 
area somewhat and to make the adjacent depression areas somewhat smaller. Factors 
supportmg the trend revision are briefly described in Babson's December 5, 1938 issue.) 


to price change has been broken, if the movement is upward it means the 
period has been one of ^^accumulation'^; if the “break out’' is downward 
it means that the period has been one of “distribution." If, now, the 
price movement of the industrials is “conjflrmed" by one in the same direc- 
tion by the rails, a sustained movement in that direction is probable 
(Or, the industrials may confirm the rails.) So long as new Mghs are 
made by both groups, a bull market will continue; but if one of the groups 
persistently fails to confirm the other, the end of the movement may be 
expected soon. A more complete exposition of this theory is given by 
Robert Rhea in The Bow Theory, Barrens, 1932, While there is some 
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logic to the Dow theory, it requires exercise of considerable judgment if 
it is to be used successfully. 

Specific historical analogy. Since all cycles are not uniform in ampli- 
tude or duration, some forecasters make use of history, not by projecting 
any fancied economic rhythm into the future but by selecting some spe- 
cific previous situation which has many of the earmarks of the present, 
and concluding that what happened in that previous situation will happen 
in the present one. We quote without comment from Charles G. Dawes^ 
How Long Prosperity'^ (A. N. Marquis Company, Chicago, 1937), p. 24: 

What I present is the unexpected discovery, by a business man in a 
study without preconceived theory and for the purpose of finding a rea- 
sonable basis for the profitable investment of money, of certam important 
parallels in the last three great depressions in this country of 1873, 1893 
and 1929. 

Pursuing these parallels, Dawes arrives at the following conclusions {ihid., 
pp. 38-39) : 

1st. That in the tenth year after the initial stock price collapse in 
both the 1873 period and the 1893 period, there occurred a stock collapse 
marking in each case the commencement of a minor business recession. 

2nd. That these minor business recessions (known as those of 1884 
and 1904) lasted in the 1884 period approximately two years, and in the 
1904 period approximately one year. 

3rd. That prosperous business conditions then ensued. 

I predict, therefore, barring wars or inflation of the currency: 

1st. That a high degree of prosperity will maintam in this country 
into 1939. 

2nd. That beginning in latter part of the year October 1938-October 
1939, the tenth year from October 1929, to wit: in the summer 
or fall of 1939, there will be a stock market collapse. 

3rd. That there will then ensue in the United States a minor recession 
in busmess of one or two years. 

4th. That this recession will be followed by a period of prosperity. 

The method of specific historical analogy is probably relied upon more 
heavily by Moody^s than by any of the other professional forecasters, 
although Moody^s, like most of the forecasters, does not confine itself to 
any particular method of forecasting. 

Cyclical sequence method. This method, an application of which was 
dealt with in the section on measurement of lag, is probably the method 
in greatest favor among forecasters. Sometimes a forecast is based upon 
the correlation of a dependent variable with one independent variable, as 
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in the case of industrial production and production of durable consumers 
goods with a 2-month lag of the former. Again, multiple correlation is 
used, involving several independent variables. 

The forecasting index of Bradford B. Smith is an illustration of this 
second method. The Smith index attempts to forecast, by one year, 
changes in the American Telephone and Telegraph Company index. The 
forecasting index is based on a defiaiite hypothesis: that business is good 
when money or credit is bemg used, and bad when money or credit is not 
forthcoming or for any other reason is not being spent. Consequently, 
all the series included bear directly on monetary or credit factors. There 
are four series in the index and, according to Smith, they behave charac- 
teristically as follows: 

1. Interest rates. When interest rates are high, long time borrowing' 
for fixed capital expansion is discouraged; when they are low, such bor- 
rowing is encouraged. Such discouragement or encouragement takes 
about a year to translate itself into changes in business activity. The 
fiist series is an average of commercial paper rates and time loan (stock 
exchange) rates. Seasonal variations prior to 1914 were removed, and 
the series were expressed in terms of per cent deviation from normal — 
3 delds of high-grade long term bonds being regarded as normal. Since a 
high interest rate forecasts low business activity, the signs of the interest 
series are reversed before being combined into the index. 

2. Monetary gold 'plus Federal Reserve bank holdings of United States 
securities. Banks tend to be liberal in their loaning policy when they 
have ample reserves, and they become more strict as their reserves dwindle. 
Naturally, when gold is imported, bank reserves are built up. Also, a 
program of United States security buying by Federal Reserve banks builds 
up bank reserves. These two series, gold and United States securities, 
are therefore added together and expressed as percentages of trend. The 
movements of this series are concurrent with those of interest rates, a year 
ahead of business activity. 

3. Changes in security prices. When security prices are rising, profits, 
financed by expanding bank credit, are spent freely. If the speculative 
fever is high, the high interest rates may retard bond issues and con- 
struction work, but they may not appreciably reduce profit taking and 
profit spending. Likewise, falling security prices bring in their train mar- 
gin calls by brokers, and the providing of that margin uses up cash which 
might otherwise have been spent. The security price changes included 
in the index are those of both stocks and bonds (bonds alone before 1919). 
The series is “the number of points which the securities would rise during 
a period of one year if they continued to rise at the same rate which the 
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trend over the past year would indicate This series likewise precedes 
business activity by one year, 

4. Amount of new long term bond flotations. Both corporate and mu- 
nicipal issues are included. The series is the amount issued during the 
12-month period ending with the current month, expressed in terms of 
per cent of trend. The theory involving the use of this series is that new 
bond issues mean new capital goods construction. 

Weights used in combining these relatives were obtained by a process 
of multiple correlation. 

Smith gives specific warning that no forecasting index is foolproof. Rea- 
sonable confidence may be had in the forecasts only so long as the economic 
relationships and practices persist which suggested the inclusion of the 
series used in the index. Up to the time of the publication of the index, 
Smithes forecasts of the American Telephone and Telegraph index agreed 
with that index quite as well as the various business indexes that are gen- 
erally accepted agreed among themselves. Smith has not kept his fore- 
casting index up to date, however, since he believes it to be applicable 
only in the ^^automatic business economy^^ which existed at the time he 
wrote, but not to be applicable to present conditions. 

Usually the data to be correlated when making a forecast are cyclical 
relatives; in other instances the data are analyzed in some other fashion. 
For instance, Karsten cumulates deviations of car shortages from their 
average in order to predict interest rates.® Again, a change in general 
business may be predicted by the spread between two series, such as the 
spread between prices of raw cotton and cloth, or between imports and 
exports. Often the relationship may be improved if the relative rather 
than the absolute difference between two series is taken. Bradford B. 
Smith, it will be remembered, includes the ratio of short term interest 
rates to bond yields. 

Another illustration of this technique is the relationship between the 
hog-com price ratio and cycles in hog marketing. According to the United 
States Bureau of Agricultural Economics, changes in the relationship of 
hog prices to corn prices cause changes in hog production which result in 
the hog cycle. As indicated by Chart 256, a period of greater-than- 
average hog-com price ratios results in an increase in hog marketings a 
year or two later, whereas a period of smaller-than-average ratios is fol- 
lowed by a decrease in marketings. The hog cycles as computed by the 
Bureau are a 12-month moving average; no adjustment is made for trend. 

® Bradford B. Smith, Forecasting Index for Business,^' Journal of the American 
Statistical Association, Vol. XXVI, No. 174, June 1931, pp, 115-127. 

® Sec Earl G. Karsten, ‘The Theory of Quadrature in Economics, Journal of ike 
American Statistical Assodationy VoL XIX, No. 145, March 1924, p. 14. 
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Occasionally the forecast may be made not from the turning point of a 

P 

series, but from its position in ■^’espect to some base line. Haney ^s y line 

(commodity prices -f- volume of trade) is said to forecast business recov- 
ery when it crosses the normal trend on the way up, and depression when 
it falls below normal.'^ Somewhat akin to this idea is Irving Fisher^s 
technique of using percentage changes in price. In general, the rate of 
price increase is greatest when the price curve crosses normal on the way 
up, and the rate of price decrease is greatest when the price curve crosses 
normal on the way down. Finally, as was illustrated earlier in the chapter, 


HOG”COR.N 

RATIO 


HOG MARKETINGS 



Chart 256. Cycles in Hog-Com Price Ratio and Hog Marketings, by Quarters, 
1901-1937. (Quarterly data adjusted for trend and seasonal variation. Derived from 
data obtamed from the Umted States Bureau of Agricultural Economics ) 


the comparison may be between first differences or percentage changes of 
both series, either with or without adjustment of the original data for trend. 

Cautions in regard to the correlation of time series adjusted for lag have 
already been given. It was pointed out that fortuitous co-variation may 
appear when there is no economic relationship, and that economic rela- 
tionships are gradually, and sometimes suddenly, changing. There may 
be a trend in the lag or it may be that series A will lag at one time and 
lead at another. Thus, sometimes changes in rates of increase or decrease 
may be a cause of changes in business activity, and at other times the 
causal relationship may be reversed. A further qualification of the use- 


^ See Lewis H. Haney, Btmness Forecasting^ Chapter VIII, Ginn and Co., Boston, 
1931. Dr. Haney uses Bradstreet's price index and, for volume of trade, railway 
freight tonnage adjusted for trend and seasonal variation. 
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fulness of the method should be added. Series A may precede series B 
by a certain period of time at revival, but by a different period of time 
at recession; and it may even be that, while series A may precede series B 
at recessions, series B may typically precede series A at revivals. A varia- 
tion of the cyclical sequence method that is simpler than correlation is 
merely to compute the average number of months by which a given series 
precedes another at the turning points, revival and recession.® These 
averages may then be applied in forecasting. It is possible also to present 
a graphic picture of the average relationship between several series in a 
manner similar to that employed by the Cleveland Trust Company. Chart 
257 is based upon two charts published in the Cleveland Trust Company 
Business Bulletin (September 15, 1932, and February 15, 1938) 

Through the use of that company’s index of business activity for each 
major depression from 1837 on (13 cycles), a depression index was ob- 
tained by averaging the 13 cycles together for the 12th month prior to the 
lowest point of activity, the 11th month, etc., until an average was ob- 
tained for each of the months beginning 12 months prior to the low point 
and ending 24 months subsequent thereto. The procedure was repeated, 
using the same time periods for bond prices, stock prices, and commodity 
prices. The result is shown in section A of Chart 257, which is redrawn 
from the data of a chart appearing in the Business Bulletin of September 
15, 1932. Chart 257, section B, is similar to a chart appearing in the 
Business Bulletin of February 15, 1938. The construction of the different 
series is the same, except that the point of reference is the month during 
which the busmess index crossed the normal line on the way down, and the 
data extend back for the 24 months preceding and the 12 months following 
this base month. The period of time included m the two charts is of ne- 
cessity slightly different. According to these charts, a business upturn is 
preceded by a simultaneous advance in stock and bond prices, and is fol- 
lowed by an upturn in commodity prices. On the downswing the sequence 
is the same, except that bond prices definitely precede stock prices. 

It should be noted that this method of computing lag is not the same 
as Mitchell’s method. The average number of month’s lag (Mitchell) is 
not necessarily the same as the number of months by which the averaged 
series lags (Cleveland Trust). The correlation method might give still 
different results. 

Cross-cut analysis. This method is based upon the theory that no two 
cycles are identical, but that like causes always produce like results. AH 
the factors bearing upon a given situation are assembled, and, relying 


® See Wesley C. Mitchell, Business Cycles, The ProUem and Its Setting, p. 337n., Na- 
tional Bureau of Economic Research, New York, 1927. 
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upon his knowledge of economic processes, the forecaster concludes whether 
the situation is favorable or unfavorable. The Standard Statistics Com- 
pany relies heavily upon this method. Although the method is essen- 
tially non-statistical,^ it is possible to develop a statistical technique by 
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Chart 257. Some Cyclical Sequences as Determined by the Cleveland Trust Com- 
pany. (The vertical scales of the different sections of this chart are not shown, since 
they are not strictly comparable in an absolute sense. They were so chosen as to make 
the amplitude of fluctuation of each part approximately the same. Based on charts 
appearing in The Cleveland Trust Company Business BulletiUj September 15, 1932 
and February 15, 1938.) 


^ Charles O. Hardy and Garfield Y. Cox, Forecasting Bxmness Condition^y Chapter X, 
Macmillan Co., New York. 1927. 
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assigning weights to each factor and then counting the score to see whether 
the net result is favorable or unfavorable. 

The United Business Service has used an interesting type of cross-cut 
analysis. Instead of marshalling an array of factors bearing upon a given 
situation, opinions of authorities are assembled and hsted each week. A 
decision is then rendered which is based on the weighted opinions of these 
authorities modified by the servicers own conclusions. 

A general caution. Some forecasting devices are built up empirically. 
A large number of series are compared with the series being forecast, and 
series which coincide best with that series in a statistical sense are selected, 
regardless of whether there is any logical cause-and-effect relation. For 
instance, if it should be found that changes in the anthrax death rate 
tended regularly to precede turning points in American business, this series 
might be included in the forecasting index. Since the relationship, if any, 
was quite accidental, no confidence should be placed in such a forecasting 
device. It is unwise to forecast upon the basis of statistical data without 
also obtaining a broad factual knowledge of the changes and developments 
under way in the field of activity under consideration. A knowledge of 
underlying economic processes is of basic importance and is essential to 
the analyst. The statistician who is searching for a magic formula that 
will enable him to forecast automatically is foredoomed to disappointment. 
Although it is perhaps a slight exaggeration, it may not be out of place to 
borrow from Dante and issue the following warning to those who approach 
the portals of prophecy: Abandon hope, all ye who enter here.’' 
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Selected List of Readily Available Sources 


General 

(Statistical data covering nearly every field will be found in these general 
publications.) 

Statistical Abstract of the United States, Annual. Bureau of the Census. 
The Statesman’s Yearbook. Annual. Macmillan and Co., Ltd., London. 
The World Almanac and Book of Facts, Annual. New York World- 
Telegram Co., New York. 

Statistical Yearbook of the League of Nations, Annual. League of Na- 
tions, Geneva. 

Survey of Current Business. Monthly with weekly supplements. Annual 
supplements also are occasionally issued. Bureau of Foreign and 
Domestic Commerce. 

Federal Reserve Bulletin. Monthly. Board of Governors of the Federal 
Reserve System. 

Standard Trade and Securities: Basic Statistics and monthly bulletins. 
Standard Statistics Co. 

Monthly Bulletin of Statistics of the League of Nations. Monthly. League 
of Nations, Geneva. 

Periodicals, such as: 

The Annalist. Weekly. The New York Times Co. 

Barrons. Weekly. Barrens Publishing Co. 

Business Week. Weekly. McGraw-Hill Publishing Co. 

The Magazine of Wall Street. Bi-weekly. The Ticker Publishing Co, 
Daily newspapers. 

Comoiodities — ^Prices, Production, Consumption, Stocks, Exports, 

and Imports 

1. Census of Agriculture. Quinquennial. Bureau of the Census. 

2. Yearbook of AgriauUnre. Annual, 1894-1935. Department of Agri- 
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culture. (Since 1935, statistical material has not been included but 
has been transferred to Agricultural Statistics,) 

3. Agncultural Statistics. Annual. Department of Agriculture. 

4. Crops and Markets. Monthly. Department of Agriculture. 

5. Special studies of the United States Bureau of Agricultural Economics 
and of the various state agricultural experiment stations 

6. Commerce Yearbook — Vol. I, United States. Annual Bureau of For- 
eign and Domestic Commerce. (Discontinued after 1932.) 

7. Foreign Commerce Yearbook. Annual. Bureau of Foreign and Do- 
mestic Commerce. (Formerly Commerce Yearbook — Vol. II, Foreign 
Countries.) 

8. Monthly Summary of Foreign Commerce. Monthly Bureau of For^ 
eign and Domestic Commerce. 

9. Foreign Commerce and Navigation of the United States. Annual. Bu- 
reau of Foreign and Domestic Commerce. 

10. Foreign Trade of the United States. Annual. Bureau of Foreign and 
Domestic Commerce. 

11. The Balance of International Payments of the United States. Annual. 
Bureau of Foreign and Domestic Commerce. 

12. Commerce Reports. Weekly. Bureau of Foreign and Domestic 
Commerce. 

13. Mineral Resources of the United States. Annual, 1883-1932. Geo- 
logical Survey. 

14. Minerals Yearbook. Annual beginning 1933. (Supersedes Mineral 
Resources of the United States.) Bureau of Mines. 

15. Census of Mines and Quarries. Decennial. Bureau of the Census. 

16 Census of Manufactures. Biennial. Bureau of the Census. 

17. Census of Distribution, 1980. Bureau of the Census. 

18. Census of American Business, 1938. Bureau of the Census, 

19. Census of Business, 1935. Bureau of the Census. 

20. Wholesale Prices. Weekly, monthly, and special bulletins. United 
States Bureau of Labor Statistics. 

21. Retail Prices. Bi-weekly, monthly, and special bulletins. United 
States Bureau of Labor Statistics. 

22. Changes in Cost of Living. Quarterly. United States Bureau of 
Labor Statistics. 

23. Record Books of Business Statistics, issued in conjunction with Survey 
of Current Business. Booklets on textiles, metals and machinery, 
fuels, automobiles, and rubber issued to date. 

24. Consumer Market Data Handbook. Annual. Bureau of Foreign and 
Domestic Commerce. 

25. Saks Management Survey of Buying Power. Annual Sales Man- 
agement. 
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26. Income in the United States. Published annually in recent years. 
Bureau of Foreign and Domestic Commerce. 

Financial — Money, Banking, Securities, Interest Rates, Taxation, etc. 

1. Bulletins of the individual Federal Reserve banks. 

2. Bulletins of various large banks. 

3. Annual Report of the Board of Governors of the Federal Reserve System. 

4. Annual Report of the Comptroller of the Currency. 

5. Annual reports of the banking departments of various states. 

6. Annual Report of the Federal Deposit Insurance Corporation. Annual. 

7. Assets and Liabilities of Operating Insured Banks. Reports on call 
dates. Federal Deposit Insurance Corporation. 

8. Dun and Bradstreet Monthly Review. Monthly. Dun & Brad- 
street, Inc. 

9. Commercial and Financial Chronicle. Weekly. William B. Dana 
Company. 

10. Statistics of Income. Annual. Bureau of Internal Revenue. 

11. Financial Statistics of States. Annual. Bureau of the Census. 

12. Financial Statistics of Cities. Annual. Bureau of the Census. < 

Business Records of Individual Concerns 

1. Standard Corporation Records. Daily. Standard Statistics Company. 

2. Moody^s Manual of Investments. (Industrials, railroads, public util- 
ities, governments, banks, etc.) Annual, and bi-weekly bulletins. 
Moody^s Investor’s Service. 

3. PooFs Annual. (Industrials; railroads; public utilities; banks, gov- 
ernments, mumcipals, investment trusts, real estate, mortgage, 
finance, and insurance companies ) Annual. Poor’s Publishing 
Company. 

4. Insurance Yearbook. (Life; fire and marine; casualty, surety, and 
miscellaneous.) Annual. The Spectator Company. 

5. Reports of insurance commissioners of various states, especially 
New York. 

6. Annual reports to stockholders of various corporations. 

Employment, Wages, and Hours of Labor 

1. Monthly Labor Review. Monthly. United States Bureau of Labor 
Statistics. 

2. Bulletins of various state bureaus of labor or industrial commissions. 

3. Special bulletins of the United States Bureau of Labor Statistics. 
(W ages and hours of labor, employment and unemployment, produc- 
tivity of labor, etc.) 

4. Special bulletins of the Women’s Bureau. 

5. Census of Unemployment (1930-1931 and 1937). Bureau of the 
Census. 

6. Census of Occupations^ Decennial. Bureau of the Census. 
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Miscellaneous 

1. Staiiscics of Railways in the United States. Annual. Interstate Com- 
merce Commission. 

2. Report on Value of Water Borne Foreign Commerce. Annual. United 
States Maritime Commission. (Until 1936 by United States Ship- 
ping Board.) 

3. Annual Report of the Immigration and Naturalization Service. De- 
partment of Justice. 

4. Mortality Statistics. Annual. Bureau of the Census. 

5. Domestic Commerce. Monthly. Bureau of Foreign and Domestic 
Commerce. 

6. Census of Distribution (1930). Bureau of the Census. (Includes not 
only retail and wholesale trade, but also distribution of agricultural 
commodities and census of construction industry, hotels, etc.) 

7. Census of the United States. Decennial. Bureau of the Census. 

8. Various monographs, and annual and special studies of the Bureau 
of the Census, and the Bureau of Foreign and Domestic Commerce. 

9. Bulletins of bureaus of business research of various universities. 

10. Religious Bodies, 1906, 1916, 1926. Bureau of the Census. 

In addition to the above sources, statistical information concerning spe- 
cific industries may be had from trade papers and trade associations. Lists 
of trade papers may be found in Ayer and Son^s American Newspaper An- 
nual and Directory and as an appendix to Thomas^ Register of American 
Manufacturers The latter contains also a list of trade associations, found 
in the appendix entitled ^^Commercial Organizations.” The Classified List 
of Trade and Allied Associations and Publications in New York City, issued 
by the Chamber of Commerce of the State of New York, lists both trade 
associations and trade papers. 

An appendix to the 1936 Annual Supplement of the Survey of Current 
Business gives sources of data under the following headings: 

1. Government departments. 

2. Commercial and trade associations. 

3. Private organizations. 

4. Technical periodicals. 
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Section IX-1 

To prove that Sx = 0. 

Let xi = Xi — X, X2 = X2 “ 

Then = 2 (X - Z) 

= SZ - NX. 

But ^ 

Therefore 'Zx = Nl - NX = 0. 


^ Xn ~ Za 


Section IX-2 


To prove that Z = Zd + -jf' 


Z = 


Xd 

N' 

Zi + Z 2 + • • * + Z^ 




Adding and subtracting Xd, 


X^Xd + 


(Zi ~ + (Z 2 — Xd) + • • • + (Zjv — Zd) 

z 


But, by definition, ' 

di — Zi — Zd, (i2 = Z 2 — Xd, ' ' ' , dN — Xn — ^d' 

Then 

di + ^2 + • ’ * + 


X = Zd + ' 


N 


= Xd + 


2(i 

Z* 
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If each item is weighted by its frequency, the expression is 

F 4 -^ 

I jy * 


Section IX -3 

To prove that X ^ G. 

Let Xi and X2 be, respectively, the smallest and largest values of a 
series. 

Let “ be the difference between the arithmetic mean and the geometric 
mean of these values. That is, 

= Vxlx^ + 1 

Zi + Z2 = 2VM2 + a. 

a = Xi- 2VX1X2 + X2 

= (VI7 - V^)2. 

Therefore | is either positive or, if Xi = X2, ^ = 0 and 

— § VXIF2. 

X I X 

If, now, Xi and X2 are each replaced by — the value of the ar- 

ithmetic mean of the entire series is not affected. The value of the geo- 
metric mean is, however, increased because, as shown above, when 

Xi ^ Z2, — > VX1Z2 and thus the contribution of ^ 

to the geometric mean exceeds the original contribution of Z1Z2. Con- 
tinually repeating this process for the smallest and largest remaining values 
results in a continually increasing value of G which approaches X, 


Section IX^ 

To prove that G ^ H, 

Let Xi and X2 be, respectively, the smallest and largest values of a 
series. 

It was shown in the preceding proof that 
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Therefore 


Xi + X2 ^ 2VX1Z2 
VY[X 2 (Xi + X 2 ) ^ 2(XiX2) 

Vx7X2 ^ 

A 1 'T' A 2 


But 


2 X 1 X 2 ^ 2 

Xi + X 2 Xi + X 2 
XlX2 


± + x 

Xi ^X2 


, which is the harmonic mean. 


If Xi and X 2 are replaced by their harmonic mean, ^ the value 

Ai -jr A2 

of H for the entire series is unchanged. However, the value of 0 is de- 

2 X 1 X 9 

creased, for it was shovm above that VX 1 X 2 > — r "4 ~' when Xi X 2 

Ai *t" A2 

(O'Y "Y \2 

X + geometric mean would be 


less than the contribution of X 1 X 2 . Continually repeating this process 
for the smallest and largest remaining values results in a continually de- 
creasing value of G which approaches H, 


To show that 
Since 


Section X-1 



X = 



X--X, 



J 2(X -- X)^ 

^ N 

/S(X2 - 2XX + f^) 

N 

/SX2 - 2XSX + XX2 


sx_ « 

ir 



But since 



832 


APPENDIX B 


By definition, d = X — Xd, ox X = d + Xd- 
Therefore* 




+ x,) 


12 


N 


_ /S(d2 + 2dXd + Xl) 

1 N 

/Sd 4- NXdV 

V N } 

_ /Sd2 + 2 JaSd + NX 

^ N 

1 _ (2d)2 + 2NJdSd + 

N2 


(Sd)2 » Sd 

jy2 

/2d2 /2d\2 

~Sn U/‘ 


For a frequency distribution, 



f2/d2 /2/d\2 

N Vn; ■ 

Or, with deviations in terms of class intervals, 

VT-*VT^- 

/2/d'\2 

A N j ■ 


Section X-2 

To show that - must vary within the limits of =i=2. 

V3 — Vl 
2 

If extreme values are added to the upper part of a series, Qa moves 
away from Qa, and Qi moves toward Qa- As a limiting condition, assume 
Qi = Qa and, since Qi — Qa = 0, the expression for skewness becomes 

Qs — Qa , n 

(Qs - Q2) ^ 2 

Similarly, if extreme values are added to the lower end of a series, Qi moves 
away from Q2, and Qs moves toward Q2, resulting in a limiting value of —2. 


Section X-3 


To prove that 


N ~ N 


-3 


N N 
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It was slio'*m in Appendix B, section IX-2 that 

For any selected X value, say Xi, xi — Xi — X — Xi — 
But Xi — Xd = di; therefore, xi = di — 


Similarly, X 2 = d 2 — xz == dz 


Thus, 




- (^Y 

[nJ _ 


2# - 3 2# + 3 (f'f Si - AT (I?)* 


Sd3 „ Sd Sd2 , „ /^2 ^ /^V3 

N ^ A N \Nj i\r [nJ ’ 

Sds „ Sd Sd2 , - /Sd\3 /Sd\3 

N ^ N N [n) ’ 

Sd3 „ 2d 2d2 , „ /2d\3 

A N N ■*■ In; ’ 


Section XI-1 

R. A. Fisher sets forth a set of criteria for testing the normality of a 
distribution. {Statistical Methods for Research Workers, Seventh Edition, 
pp. 54-56 and 74-80.) He computes ki, kz, ks, and ki, the last three oi 
which are somewhat similar to ttz, ts, and ir 4 except that degrees of freo 
dom are taken into conside cation. Thus: 

2X 


2x2 

N-1 

N(2x3) 

{N - 1){N - 2) 


(JV - 1) N - 2){N 




{N + l)2x* - . 



834 


APPENDIX B 


(The values of and k 4 may be corrected for grouping as follows: 

^2 “ ^2 and ^4 == /b 4 4“ 

No correction is necessary for ki and k^; hence = ki and k^ — ks.) 
From the fc^s, there may be computed the values of 


and 


Qi = 


ks 

VM’ 


a measure of skewness, 


9^2 = a measure of kurtosis, 


which should each be zero for a normal distribution and which are dis- 
tributed normally for large samples. The variances of gi and g 2 are then 
determined to ascertain whether gi and g 2 differ significantly from zero. 
The expressions are: 


^ ^ - 1 ) 

-2){N + l)iN + Sy 

j, 2W{N - ly 

{N - B){N - 2)(N + 3)(iV + 5)‘ 


Section Xn-1 

To prove that when P is infinite or very large. 

Samples of N items each are drawn at random from a population of 
P items, as indicated below. There are pCiv- such samples. 


Item 

Sample 1 

Sample 2 

Sample 3 

a 

Xai 

x^ 

XaZ 

b 


X52 

X53 

c 

Xci 



N 

Xni 


Xvs 


Letting x mpresent a devi^ion from the population mean, we have 

Xal Xp, Xu Xp, * * * j ^N1 ~ Xp, Xa2 ” ^a2 — Xp, 

etc. We slmll designate the various items as Xp + Xai, Xp + Xbi, • • • , 
Xp + xm, Xp + Xa 2 , etc. v 

For sample 1: SZi == NXp + Sxi, 

For sample 2: 2 X 2 « NXp +• 2x2, etc. 
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Since adding a constant to (or subtracting a constant from) a series of 
values does not alter the value of cr, we have 

XX — Xx 
a a 

— 

(where a; is a deviation from Xp and therefore for any sample Sx 5*^ 0), 

a 

and therefore 

pCn/ N \2 

2 2 
- 

XX — 


pCn/N \2 ypCN N \2 pCn / N \2 

pCn V pCn / pCn 


pCn n 


N 


since S Sx = Sxi + Sx2 + *- - + S; 


1 a 


'pCn 


0, 


and 


pCn / N 

pCn(T% = S ( 2x) = S (Xa + Xa + Xc + 
sz 1 \a / 1 


+ XiV')^. 


Now for any one sample, for example sample 1, 

^^X^ = (Xa “f" X5 "h Xc ~}“ * ' * “f" 

= Xi + XaX6 + XaXc + * * • + XaXjv 

+ XaXs + Xi + XftXc + ‘ • • + XfeXiV 
+ XaXc + XhXc + X? + * • ' + XcXi\r + . . . 

+ XaXN + XftXi^ + XcXjsr + h x^ 

JV N 

= 2x^ + 2SxtXj, 

a a 

where XjXj represents the product resulting from each combination of two 
different items. Therefore 


pCn/n n \ 

pCno^n ^ 2 { Sx^ + 2SxiX J. 

XX l \a a / 


N 

Since there are pCn samples each containing ^ of the population, any 

N N 

given item (x^) will occur in p of the samples, or p pCn times. Thus each 

N . .AT 

x^ will occur ppC'iv* times. Now if a given item (xi) occurs in p of the 

iV — 1 

samples, a second item (x;) will occur in p of the samples in which 

N(N — 1) 

the first occurs, and both of these items will occur in p^p — of all the 
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r 

samples, or times. Thus each iOiX;j wiJl occur p^p __ 

times. 

Therefore 

= ^fCn^x^ + 2|p^j;>C^Sa:,xy, 

a 

where S indicates a summation over the entire population, and 

p 

9 o , - 1)^ 

”■ ^P(P — 


Now 


2Z PP 
a 

(Sx)- = Sx^ + 22 x,xj 

P P 

by a envelopment similar to that shown above for 
iZx^Xj = (SixY — Sa;2. 

P P p 

But 2a; = 0; therefore 22a;* rcy = — 2^;^, and 

ra ' T> P 


[ 


(L)]aad 


(X^ 

sz 


= _ m- Jh ^2 

Pp P(P - 1)“?^ 

- pPo-p p^p _ j)Po-i 

= N<r| p~i 


N - 1 


-1) 


2 X 


= iVcrl^l 

JP - 1 iV - A 

= 

= iV(r|^ 

J 

— "S/ N O'j’i 


,P-N 

^-PZTT' 




i' 


If each sample contains N items, each deviation of a sample sum from 
the mean of the sample sums is N times as large as each deviation of a 
sample mean from the mean of the sample means, each squared deviation 
of a sample sum is times the squared deviation, of each sample mean, 
and the standard deviation of a scries of sums is N times the standard 
deviation of » series of means. Since all possible samples of siz® N were 
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selected, the mean of the sample means = Xp. Therefore, dividing each 
side of the equation by N, we have 


o-p \P-N 

Vn "V p - 1 ■ 


If 1 is negligible in relation to P, 


cfp I 

= vrV' 


If P is infinite or if P is very large in relation to Nj the expression becomes 


Section Xn-2 


To prove that 


Mn - 1 


For any sample composed of items Xa, Xb, Xc, • * • , Xa 


N N 


2(X - X)2 


SX2 -- 2XSX + NX^ 


/N \2 /N \2 

^ 2fsx) SX) 
N ^ N 


(M 


N 

NSX2 


(f) 


Following the demonstration given in the preceding proofp 
flzV = (Z« + Xj + Xc + ■ • • + Xn)^ 


N 

= SX2 + 2SXiX,. 
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Therefore 


N 

0 

/N JV . 

+ 2SZ.Z,j 



! 

1 

jf iV 

SZ2 - 2SZ.Z, 

a d 


N2 

1 

to 

1 

- 1) - 2SZ.Z, 

a 


N2 


Now there are pCn possible samples, each consisting of N items, taken 
from a population of P items. Therefore, using to designate the mean 
of the variances of all the samples, 

pCn 

o 

.2] _ erf + (r| + • • • + O'p^^r _ i 

pCn pCn 

— ^ gt o 

” pCi^ I L 

N 

As each sample consists of N items, it contains p of the population and 

N 

any given item (Xt) will occur in ^ of the samples. But, since there are 

N 

pCn samples, each item will occur altogether -ppCN times and each will 

N N 

occur 'ppCN times. Now, if will occur in ^ of the samples, any other 

N — 1 

given item (Zj) will occur in p — j- of the samples in which occurs. 

N(N — 1) 

and Xt and Xj will both occur in p^p samples. Since 

there are pCn samples, Xt and X, will both occur in the same sample 

— ^pCiv times. Therefore the product X Xj will occur ^ 7 —^ — ^pCiv 
— x) - 1 } 

times. Thus 

{N - 

Oi= £ £ -k ^ ~ L. 

^ pCnN^ 

(N - 1)|SZ2 - gf^ - z 
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It was previously shown that 


\2 N 

(sx) =2 

\ o / a 


N 


+ 2SX.X;. 


Similarly, 


Therefore 


/'SXV = SZ2 + 2SX.Z,, and 

\P J P P 

= (XXY - 2X2 

p \p / p 

- ifs j (F)' - H 


N 


A7 1 2X2 

_ iv — 1 p 
N P 

717 1 2X2 

_ N — 1 p 

N . 

P L N ^ NiP 


JV2 

X(P - 1) P 


+ 


AT 1 2 X 2 

X - 1 p 


X(P - 1) P 


+ 


N - 1 f P(X - 1) (^f 

x(p - 1) V p / 


X(P - 1) P 
N - 


- 1 1 _ r. 

- i)J . 


■p(X - !)■ 
.X(P - 1). 


/SX\2 

( p ) 

\ p / ’ 


But 


N 


1 _j_ X - 1 


_ (X - 1)(P - 1) + (X - 1) 
X ' X(P - 1) X(P - 1) 

PX-X-P + l + X-1 


PX 


X(P - 1) 

P P(X - 1) 


Therefore 


X(P - 1) X(P - 1) 

Hi - lY _ f (jy - D/ ^PV 

J “ X(P - 1) P X(P - 1)\ p / 


P(X - 1) 


2X2 


X(P - 1)L P \ P / J 

pjN-l) , 

NCP - 1) 


X 

P approaches infinity, approaches 1 and 
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But cr^ from our sample is the only available estimate of [cr^], the mean of 
the variances of all the samples. Therefore, designating our estimate of 
cri by 5-2, 

Srr2 jv 

~ N N -1 

2x2 

= 


- / 2x2 

Vn - r 


To show that a = 


Section Xn-3 

["2X2 (SZ)2 

fN - 1 N{N - !)■ 

W- 1 


N - 1 

2(X - X)2 
N- 1 


But NT = SX. 
Therefore 


2(X2 - 2XX + X2) 

N - 1 

2X2 - 2X2X + XX2 
iV- 1 


2X2 _ 2X2X + X2X 
JV - 1 

2X2 _ JSX 

N -1 
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SZ 2 - 


(SZ )2 

N 


N -1 

'LX^ (SZ)^ 


Z - 1 N(N - 1) 

/“2Z^ (2Z)2 

'Vz - 1 - ■ 


, and 


Z(Z - 1 ) 


Section Xn-4 


To prove that cr, 


= \/(^| + o-| 

X2 


Let Xai, Xa2; Xbi, Xh2', • ■ ‘ ; Xjvi, Xi\r2 represent two series of sample 
values which may be inherently paired or which may have been arbitrarily 
paired by the process of selection. Thus the standard deviation of the 
difference between the two series may be computed as follows: 

Deviation from mean 


Item 

Series 1 

Series 2 

Difference 

of the differences* 

a 

Xal 

Xa2 

X(i2 

(Xei - Z.s) - (Zl - Z 2 ) 

h 

Xbi 

Xh2 

Xbi — X62 

(Am - Am) - (A^ - X 2 J 

c 

Xcl 

Xc2 

Xol ~ X.2 

(Am - Am) - (X, - A 2 ) 

N 

Xm 

Xn2 

Xm — Xn2 

(Xni — Xjffi) — (Ai — Z 2 ) 


* The mean of the differences is equivalent to the difference bettreen the means , thus 
SCXi - Za) SZi - SZ2 
Jf ~N 

S[(Zi - Z 2 ) - Zi - r2)]2 
= N 

_ S[(Zi - Zi) - (Z 2 - W 
N 

_ Xjxi — X 2 Y 
Z 

Sa;f 2 X 1 X 1 X 2 I 2a| 

" Z N N 

(see Chapter XXII), and therefore 


But ^ = 


'SiXixz 


^1^2 ZtTXjOjTj 


2ljXtX2 

z 
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Therefore 

Oxj - Xj = Oxi + 1 and 

cTti - Xj = V - SrxjXjO-Xitrxj. 

When the items of two series are inherently paired and when correla- 
tion is present, the above form may be used. However, if the samples of 
series 1 and series 2 have been drawn independently of each other, r = 6, 

and 

(rx,-x, = + 

If Xai, Xo 2 \ Xhu ^h 2 ; etc. are now taken to be sample means instead of 
single observations, the expression becomes 

o-x,-x, = Vo-I^ + - 2rx,x^(rs,a-s, 

if there is correlation between the paired sample means, and 

01, -r, = V (r|^ -t- <r|^ 

if there is no correlation between the paired sample means. 


By a procedure similar to the above, it may be shown that 


and 

+ fj = 


Section Xn-5 


To show that cri^ . 


-i 


(Ni + N2)(2xI + Xxl) 
N 1 N 2 KN 1 ~ 1) + {N 2 - 1)]' 


The estimated variance of the population from which the first sample 


was drawn is 


Xxl 

Ni-1 


, while that from which the second sample was drawn 


IS 


Sol 

N 2 - 


% These expressions are each ratios, and it was demonstrated in 


Chapter VII that ratios based on different N’s can be averaged correctly 
only when properly weighted. The simplest way to accomplish the weight- 
ing is to divide the total of the original dividends by the total of the 
original divisors in both series. Therefore, in averaging the two variances, 
we total the squared deviations (Sa:f + and divide by the total de- 
grees of freedom [(Ni — 1) + {N 2 - 1)]. We have, then, 


d-?+2 = 


Sa:f + 

(Ni - 1) + (N 2 - 1) 
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The expression lor the standard error of the difference between two means is 




riVi + iV2 


If, instead of employing two estimates of variance and of, we use a 
single estimate a-f+st this becomes 




V 


^1+2 I ^1+2 

Ni N2 


<rf,. 


-4 


-s 

)S 

fiV,CT?+2 + -^^2^1+2 

iViWz 

'Ni+iV2., 

N 1 N 2 

iNi + N 2 So;? + 

r Wiiv2 

(Ni - 1) + {N 2 - 1) 

1 (Ni- 

+ W2)(Sx? + S4) _ .r 

^NiN2[{Ni - 1) + (W 2 - 1)] 


iNi + N2)(Sx? 4- 

NiNiini + nz) 


When Ni — N 2 , the result is the same whether we employ fff and 


or ai+ 2 ) smce 


-2 _ + Sal 

<^142 - 2N -2 ' 


as,. 


= V: 


l2N 

Sa| + Sa| 

N2 

2N -2 

f . Sal 

N - 

- 1 ' N - 1 


Sxf + 

W(W - 1) 


V 

Z-JK+M. 

1 ^ N 


N 


=V: 




Section Xn-6 

To show that, for a proportionally stratified sample, 




a Stratified sample 


N 


a ot strata meaia 

N 
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( 

Let there be a,h, • , Ps items in a stratum of the population. 

Let there be 1, 2, 3, • • • , m strata and let S designate a particular 
stratum. 

A stratified sample of N items is to be selected from a population of P 
items so that the number of items selected at random from each stratum 
{Ns) represents the same proportion of the entire sample {N) that the 
number of items in the stratum (Ps) bears to the population (P). That 
is,Ns:N::Ps: P. 

If we Imow the true variance of each stratum of the population (cr|), 
we can, by proper weighting of each stratum variance, obtain the variance 
in the population around the strata means. Since we are using a stratified 
sample, and since deviations are measured with reference to the strata 
means (Xs) rather than with reference to the population mean (Xp), we 
shall designate this measure as in order to distinguish it from cr| (the 
population variance in which each deviation is measured from the popu- 
lation mean). We have then 


m Pg m 

S S (X - Xs)2 SPs<r| 

<r|, = ^ 


Since the general expression for the variance of the mean is cr| — we 
may write 




X of a- stratified sample 


N 


SPscrl 


W - V 

2 X - Xs) 

Now <x% = ~ — 5 > and 

Ps 

S {X — ^s)^ ^ 2 (X — Xs — Xp “1- Xp)^ 

a a 

= s[(X - Xp) - (Xs - Ip)J 

= s[(X - Xp)2 - 2(X - Ip)(Is - Ip) + (Is - Xp)^] 
= S (X - Xp)2 - 2(1 s - Ip)i (X - Ip) + Psds - Xp)2 

a a 

= S(X - Jp)2 - 2(Is - Xp)(5x - Pdp) 

■* “ + P8(Xs - Xp)'^ 
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= 2(Z - Xp)2 - 2(Xs ~~ Xp)(PsXs - PsXp) 

“ + PsiXs - Xp)2 

= S(Z - IpY - 2Ps{Is ~ XpY + PsiXs - XpY 

a 

= S(Z - XpY - Psi^s - IpY 


Therefore 


Now 


<r| = -2- 


S (Z - IpY 


- (Xs - XpY. 




SPso-i 2P, 


XiX-IpY 


■ - (Z 5 - XpY 


f and 


rri 

of a, stratified sample 


SPs 

1 


S(Z- Jp)2 


■ - (Zs - Zj>)2 


- 4 - N 


m 

^Ps 

1 


S(Z-~Xp)2 

_a 


ZPsiXs - XpY 

1 


-hN 




2S (Z - IpY SP^CZs - Zp)^ 

1 a 1 


N 


_ <r| ^ Strata means 

~ N N ’ 

where er| is the population variance computed in the usual fashion with 
reference to the population mean and cr?j strata meam is the weighted vari- 
ance of the strata means. 


If the population (P) is finite and the sample (Z) is large in relation to 
the popidation, we must refer to the expression given in section XII-1: 


0's 
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Squaring gives 


N‘ P-l' 


For a stratified sample this becomes 


fT ^ 

^ a stratified sample 


( 


SPs(7|. 

J 

P 



O'?! 



Section XIQ-l 


To prove that 



2 

For the data of 40 first cousins, the value of was found to be .40 by 
the usual procedure : 


Sex 

Observed 

/ 

Expected 
ratio 1 : 1 
fc 

f-fo 


(/-/oP 

fo 

Male . . 

22 

20 

+2 

4 

.20 

Female . , 

18 

20 

-2 

4 

.20 

Total . 

40 

40 



.40 


Using a, b, p, q, and N, as in Chapter XIII, we have from the above 

^2 _ (g - PN)^ , (J> - qN)^ 

^ ~ pN ^ qN 

a2 - 2apN + P^N^ b^ - 2hqN + q^N^ 

~ pN qN 

_ gg^ — 2apgN + p^qN^ + pb^ — 2bpqN + q^pN^ 

Npq 

_ gg^ + pb^ + p^qN^ + q^pN^ — 2pqN (a + b) 

Npq 

Now a + 6 = 2V; hence 

, qa^ + pb^ + p^qN^ + q’^pN^ — 2pqN^ 

^ ~ Npq 




APPENDIX B 


847 


_ qa^ + pb^ + yqN^jp + g — 2) 

N'pq 

But p + 2 = 1, and p + g — 2 = — 1; therefore 

2 ^ + yh^ - yqN^ 

^ Npq 

_ qa^ + ph^ — pq{a + 

Npq 

_ qa? + p&^ — {pqo? + 2pggb + pgii>^) 

Npq 

Now p = 1 — g, and g = 1 — p; therefore 

2 _ g<^^ + — [(1 ~ g)g<r^ + 2pqab + (1 — p)p&^] 

^ ~ Npq 

— ~ + ^PQCib + — p^h^) 

Npq 


_ — 2pqab + 

"" -^Pg 

=« (ga - ?>&)^ 

Npg 


Dividing by ^ gives 


^2 = 




Ejf 


Section Xin-2 

Derivation of Formnlae Used for Computing Total Variation, Variation 
Within Columns, and Variation Between Columns 

In the following developments 

Nk represents the number of items in a column, 
m the number of columns, and 
N the number of items in all columns. 

S refers to a summation of items 1 to in a column, 

1 

m 

S indicates a summation of values for columns 1 to m, and 

X 

a 

S = S designates a summation of all items. 

1 
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A. The deviation of an item in a column from the grand mean (F — P) 
may be broken into two parts: first, the deviation of the item from the 
column mean (F — Fi); and second, the deviation of the column mean 
from the grand mean (Fi — F). Thus, for an item in the first column. 

(F - F) = (F ~ Fi) + (Fi ~ F). 

As a measure of total variation, the deviations (F — F) are to be 
squared and summed. Squaring and summing, first for a single colunm, 
we have 

r "12 

s (F ~ F)^ = 2 |^(F - Fi) + (Fi - F)J 

= S [(F - Yir + 2(F - Fi)(Fi - F) + (Fx - F)2 

= S (F - Fi)2 + 2 (Fi - F) S (F - Fi) + iVi(Fi - F)2 
1 1 


Now the summation of the deviations of the F values of the column 


from the column mean Yi equals zero; that is, S (F — Fi) 

1 

therefore 

2 (F - 7y = 2 (F ~ Fi)2 + i\ri(Fi - F)2 
1 1 

and similarly for all other columns. 

Summing the preceding expression for all columns gives 

m p *1 m p ATjj, "1 tw p “I 

2[^ 2 (F -- F)2j - 2|^ 2 (F ~ F^)^] + 2[^A^(Fir - Y)^j, 

2(F -- F)2 - 2[ 2 (F ~ Fir)"] + ^\NKi7K - F)^]. 
It is apparent that 


0, and 


or 


N 


2(F — F)^ = total variation, 


_ 1 

21 2 (F ~ J “ variation within columns, and 
|[iVii:(Fx - F)2j = variation between columns. 


B. For purposes of computation each of the three above expressions 
may be simpiified; 
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(1) Total variation, S(F — Y)^ or S(F — Y)^. This is the numerator 
1 

of the expression previously used in computing 

S(7 - F)2 = S(y2 _ 2YY + F2)_ 

= SF2 - 2F2F + NY^ 

=. SF2 - ^FSF + FSF 
= SF^ — FSF (This form is used in chap- 
ters on correlation.) 

= SF2 - 

N 


(2) Variation within columns^ S S (F — Yk) 

iL 1 


2 . Thii 


This expression says: 


'Tor each column, sum the squared deviations from the mean of that 
column; then sum these totals for all columns.” For the first column, 

S (F - Fi)2 = S (F2 2FFi + Ff) 

1 1 

= SF2 - 2Fi S F + iFiFf 
1 1 

= S F^ -- 2F, S F + Fi S F 
1 11 

- S F" ^ Fi S F. 

1 1 

Summing this last expression for all columns gives 

mr “I m/_ \ 

si 2 (F - YkY^ = SF2 ~ X\Yk S Yj (This form is 

used in Chapter XXIII.) 

r/iv, 


- SF2 ^ S 
1 


m 

L Afjr J 


If iVl = N2 


= Nm, the expression becomes 


mr _ 1 

Sl S (F - Fs:)2j = 


Sy2 


m/JV^ \2 


Nk 


( 3 ) 


“r — 1 . 

Variation between columns, S Nzi^K — F)^ J. This expression 
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r 

says: “For each column, square the deviation of the column mean from 
the grand mean, multiply by the number of items in the column, and 
sum these products for all columns/’ 

- F)^] = i[NK(Fl - 2F^F + F2)] 

tn ^ 

= S(N;,Fi - 2NkYkY + NkY^) 


= S(iV;cF|) - 27'2{NkYk) + ^{NkY^). 
1 1 1 

m rn 

But S(A>F|) - S(FicS F), 

1 1 1 

m/ 


m m / Nj^ \ 

S(NxFjc) = 21 S F) = SF, and 
1 1 \ 1 / 

m 

'L{NkT^) = NY^ = F2F. 


Therefore 


mf" "1 ^ \ 

2 Nk(Yk - F)2j = 2(^F^ f V “ 2F2F + F2F 


m/ \ 

2(Fk 2 F1 — F2F (This form is used in 
^ ^ Chapter XXIII.) 


= 2 
1 




^2-^ 

2 F 
1 


(2F)2 

N 


JiNi = iV2 = ■ 


L J 

Nm, the expression becomes 

m / N^. \ 2 




Section XV-1 

Derivation, of NSnnal Equations for Straight Line 

If Fc is a trend or computed value, F — Fo is a deviation from trend. 
To satisfy the least-squares criterion, S(F — Fc)^ must be at a minimum . 
Since the straight line equation type is Fc == a -f 6X, 

SCr ^ Yc)^ - SfF - (a + bX)P « 2(F ~ a - bX)K 
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Expanding, this expression becomes 

272 - 2a2F - 262X7 + Na^ + 2a62X + 

Calling this expression <^, and taking the partial derivative with respect 
to a, we have 


8a 


-227 + 2Na + 262X. 


The value of a curve is at a minimum (or maximum) when its slope is zero, 
Therefore, setting the partial derivative equal to zero, we have 

-227 + 2Na + 262X = 0, 

27 = Na + 62X, which is normal equation I. 

Differentiating with respect to 6: 

= -2SXF + 2aSZ + 26SX2 


Setting the partial derivative equal to zero: 

-2X7 + 2aX + 26X2 = 0, or 

2X7 = a2X + 62X2, which is normal equation II. 


Section XV-2 

The Least Squares Criterion 

The following discussion assumes that the distribution of chance errors 
follows the normal curve, and that the best central value from which to 
measure such accidental deviations is therefore that value which makes it 
most probable that the deviations are distributed normally. 

Let a series of such deviations, or errors, and the interval within which 
they fall be designated by the following symbols: 

xi is an item falling at the mid-point of a very small interval, Axj ; 

X2 Ax2' 


% « u a a a Aajjv. 

Now the probability that a deviation will fall within a certain interval is 

p — Area of frequency curve within boundaries of that interval 
^ Area of entire frequency curve 

Thus the probability of obtaining an error Xi which falls within the inter- 
val Aa;i is approximately the ratio of the area of a rectangle , with base of 
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Axi and height the ordinate at the mid-point of the interval, to the area 
of the entire frequency curve. 

If this curve is the normal curve, this probability is 




(X\/2'K 


Axi, 


since the expression for the ordinate of a normal curve as a ratio to the 


entire number of frequencies is Fc 


(X\/27r 


The probability of obtaining errors X 2 j xzj etc., falling within specified 
intervals is similarly obtained. 

The probability that several independent events will occur is the product 
of the individual probabilities of the separate events. Therefore the prob- 
ability that the particular set of errors will occur which we have assumed 
(that is, a normal distribution of errors) is as follows: 






’‘20’2 


4.,) X 


^2 

“20*2 


(T\/27r 


Aa;2| 


X • - ' X 


\cr\/2x 


2(72 


Ax. 




Xi -j- + * • 4- 


E. 

(r^27r2 


2cr2 


X Axi X Aa:2 X • • • X Axiv^. 


Since any number raised to a negative power will be greatest when that 
exponent is least, P is greatest when xf + + * * * + is least. There- 

fore the probability that accidental deviations from some central value 
will follow the normal curve is greatest when the sum of the squared 
deviations from that central value is at a minimum. 


Section XVI-1 

Derivation of Equations for Fitting Growth Curve Fc = fe + ah^ 

Designating by n the number of years in each third of the data, the 
first equation (see equation I, p, 443) is: 

2iF == nA + a + a5 + -f + • * * + ^ 

rJ + 41 + & + 6^ + 6^ + • • • + 6^” 

1} 1 

If now the expression inside the brackets be multiplied by ^ have 
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[1 + 6 + 62 + 63 + . . . 4. - D] (5 _ 1) 

6 - 1 


( 1 ) 


6 + 62 + 63H f- 5 (n-l)_j. 5 n_ 1_5_J,2_53 5 (n- 1) 

6-1 

6 ” - 1 
6 - l‘ 


The fourth term shown in the numerator of expression (2) is 6^" ~ 
This follows from the fact that the next to the last term within the brackets 
of expression (1) may also be designated as 6(’* ~ 2) ■ and 6^” “ 2) x 6 
= 6^” ~ All three equations are obtained in a similar fashion. They 
are: 

I. 2iF = nk + 

II. S2F = nk + 

III. SsF = nk + 

Equations A, B, and C now are: 

A. S2F - SiF = - 1) = 

B. S3F - S2F = 


23F - S2F 
S 2 F - SiF 


ah® 


(6" - 1)2 
6-1 



= 6 ®. 


Equation A gives us the formula for a: 

S 2 F - 2iF = 

a = 22F-2xF^^^. 

From equation I we find: 

5” *-- 1 

2iF = wfc + ax r- 

6 — 1 


k - 
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Section XXII-l 

To prove that = F. 

Fc = o + IX. 
SFc = S(a + bX) 
= Na + b'EX. 

But Na + bSN = SF (Normal equation I). 


Therefore SFc = SF (1) 

SFc SF , 

?c~T (2) 


To prove that SF^ = aSF + 6SXF. 

SF? = S(a + bXy 

= S(a2 + 2abX + b^X^) 

= Na^ + 2a6SZ + b^'ZX^ 

= a{Na + b'SX) + 5(aSZ + hSZ^). 

But Na + 6SX = SF (Normal equation I), and 
aXX + 6SZ^ = SXF (Normal equation II). 

Therefore 

SF? = aSF + 6SZF (3) 

To prove that = SF? - FSF or (aSF + 6SZF) - FSF. 

It has been shown (in Appendix B, section XIII-2) that 

Sy2 = SF2 - FSF. 

Similarly it is true that Sy?' = SF? — FcSFcr. 

But Fe = F (equation 2) and SFc = SF (equation 1). 

Therefore Sy? = SF? - FSF (4) 

However, since SF? = aSF + b'ZXY (equation 3), it follows that 

Sy? = (aSF + bSZF) - FSF (5) 

To prove that Sy| = SF^ - SF? or SF^ - (aSF + 6SZF). 

Syi = S(F - Yc? 

= SF2 - 2SFFc + SF?. 

But Fc = a + hX] hence SFFc = S[F(a + bX)] = S(aF + 6ZF) 

= aSF + 6SZF. 


Now aSF + &SZF = SF? (equation 3). 

Therefore Sy| =SF^ — 2SF? + SF? 

= SF2 - SF? (6) 

= SF2- (oSF + hSZFl (7) 
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N N ' 


To prove thJit = crf^, + 

This expression may be written: 

V 

N 

Multiplying by N, we have 

S2/2 = + Sy|. 

But we know that 'Zy^ = ZY^ — FSF, and it has been shown that Zy^ 
= SFc - YZY (equation 4), and that Zyj = ZY^ - ZYl (equation 6). 
Therefore, substituting, we have 

SF2 _ = SF| - FSF + SF2 

= SF2 - FSF. 

Thus Zy^ = Zy^ + Zy% 

and O’? = O'?,, + o'Is 


ZYl 


(8) 

(9) 


Section XXn-2 

Derivation of Constants for Straight Line Equation when Origin is at X, Y 

The normal equations for fitting a straight line by the method of least 
squares are 

SF = Na + bZX; 

ZXY = aSX + bZX^. 

If the origin be taken at X,F instead of 0,0 we have 

Zy — Na + 6Sx; 

Zxy = aZx + hZx^. 

But Zy = 0, and Sa: = 0. 

ZjTJH 

Therefore a == 0, and h = 

The estimating equation becomes yc = ix instead of Fc = a -f- 


Section XXII-3 


Given that r 


= V1?- = 




-'M. 

It follows from equation 5 of section XXII-1 that 
Zyi = bZxy 

_ 2?^ 2x1/ - 
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Therefore 


(Sa:y)^ _1 (Sxy)^ 

Scc2 ^ S2/2 2a:2S2/2 


But 2a;2 = Ncr%, and = N(r^. 


Hence 


= Ao f4, and 
N^alaf 


y == 

NcTsCTy 


Section XXII-4 

.V . - N2Z7 - (2X)(S7) 

o prove at V[A^SX2 - (SZ)2][NSr^ - (27)=^] 

^xy = S[(X - 1)(7 - F)] = S(X7 - F7 - Z7 + lY), 
= 2X7 - J27 - 72X + XX7 
= 2X7 - NIY - NXY + NXY 
= 2X7 - NXY. 


= V¥-’-(W’“^-=V¥'-(fr 


Therefore 


'^xy ^ 

No'xO'y 


2X7 - NX? 


,, /2X2 /ZXV /272 /27\2 

^Vir - (-rj Vir - W 


X(2X7 - NXY) 


r,, 12X2 /2X\2T,, 1272 /S7\2l 

r ViT - (i^) Jr ViT - (ir) J 


■XXXF - (2X)(27) 


V[N2X2 - (2X)2][XE72 - (27)2] 


Section XEII-5 

Changing Units and Shifting Origin for a Straight Line Equation, 
Correlation of Grouped Data 

The equation dy^ == .3561 — .4737 dy is stated in terms of class-interval 
deviations from assumed means, Thus the value of 7 at the assumed 
mean of X (44.5) is .3561 intervals above the assumed mean of Y (65). 



APPENDIX B 


857 


To find the value of a when X (rather than is zero, we must therefore 
substitute —4.45 for dx in the equation. Thus 

dy^ = .3561 + (~4.45)(-.4737) 

= 2.4641. 

When X is zero, a is therefore 2 4641 intervals above the assumed mean 
of F. It is thus 2.4641 X 10 physical units above 65, or 89.641. 

In the original equation, b indicates how many intervals Y increases 
with an increase of one interval of X. To put h in terms of the original 
units, we must multiply h by the ratio of the Y interval to the X interval, 
in order to get h into terms of physical units. In this case, however, the 
class interval of each variable is 10, so that the multiplication does not 
change the value of b. 

The equation then is Yc = 89.64 — .4737Z. 


Section XXII-6 

Derivation of Expression for Population Estimate of r 




= 1 and = 1 — r 


<yl 




But 0 'S = since the y deviations are measured from the 

iV— 1 N^ — 1 


mean, which is one constant; and o-f^ = 


Sj/I 




— % since the ys 

m N m 

deviations are measured from an estimating line containing m constants. 
Therefore 

^ ~ ^ Nal-i- (N -1} 

crl(N - 1) 

crlOV — m) 

N -1 


= 1 - (1 - r2) 


N — m 


^ (N -m)- (1- r^){N 
N — m 

rHN - 1) - (m - 1) 


1 ) 


N —m 

In the above form the expression may be used for linear or nonlinear 
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correlation, for multiple correlation, and for the correlation ratio. For 
simple linear correlation, w = 2; therefore we may write 

.. rKN - 1) - (2 - 1) 

^ " N-2 

rKN ~ 1 ) - 1 
N -2 

The derivation of the form used for the population estimate of a coefficient 
of partial correlation is shown in Section XXIV-4 of this Appendix. 


Section XXHI-l 

The flexibility of price, according to the equationFc— 1,735, 
is -2.149117 throughout. With the other types of equations used in this 
chapter, the flexibility varies with quantity sold. 

Flexibility of price is the percentage change in price (Fc) associated with 
an infinitely small percentage change in quantity (X), or 

dYc ^ Yc 
Flexibmty — — » 

This expression is more convenient to compute if written 
Flexibility = ^ X 

Therefore to compute the flexibility of price, differentiate with respect 

X 

to X and multiply the derivative by the ratio of at the point desired. 

I G 

In the present instance we have 

Flexibility == [(-2.149117)(1,735,128)Z-3-149 ii 7 j ^^9^7 ] 

= -2.149117. 


Section XXin-2 

The point of diminishing returns is the highest point in the curve; 
Tc = 890.324 + 78.264Z + 20.324Z2 _ 4.4649Z3. At this point the 
slope is zero. The slope of a curve at any point may be formd by taking 
the first derivative of the equation. The first derivative of the above 
equation is: 


^ = 78.264 + 40.648Z - 13.3947Z2. 



APPENDIX B 


859 


dY 

Setting = 0, we have 

78.264 + 40.648Z - 13.3947Z2 = 0, and 

X = -"^Q»648 V(40 648)^ - 4(- 13.3947) (78.264) 

2(- 13.3947) 

= 4.37128, or -1.33669. 

When the slope is zero, we have a maximum or a minimum point. In 
this case only positive values of X are of interest, and inspection of Chart 
228 indicates that a maximum is reached when X is close to 4. Or, if 
the reader will compute Yc values in the neighborhood of X = — i 33669 
and X = 4.37128, he will discover that the former is a minimum and the 
latter a maximum. 

When X = 4.37128, 

Yc = 890.324 + 78.264(4.37128) + 20.324(4.37128)2 - 4.4649(4.37128)^ 
= 1,247.85, 

The point of diminishing total returns is 4.37128 per cent nitrogen. At 
this point the estimated yield is 1,247.85 pounds. 

The point of diminishing marginal returns is the point of inflection in 
the curve. It is the point where the change in the slope is zero. The 
change in the slope is the second derivative of the estimating equation. 


Thus 

= 40.648 - 26.7894X. 

Setting 



40.648 - 26.7894X = 0, and 


X - 1.517317. 


This is the point of diminishing marginal returns. At this point 
Yc = 890.324 + 78.264(1.517317) + 20.324(1.517317)2 - 4.4649(1.517317)3 
- 1,040.27. 


Section XXIII-3 

Changing Units and Shifting Origin for a Second Degree Curve 

The equation as stated is: 

= ^.65290435 - .40268355di + .058222561 
Orisin Xd. Fd. Units: intervals of X and F, 
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To obtain the equation in original units, but with origin at Jli, Yd. 
a = -.55290435 X 25 = -13.82261; 

OK 

h = -.40268355 51^ = -.1510063; 

66.67 

OK 

c = .058222561 = -0003275019; 

and the equation becomes 

= -13.82261 - .1510063djs- -f .0003275019(dx)2. 

Origin: X = 633.33, 7 = 112 5. 

To shiit origin to X = 0, 7 = 0, find the dy^ value when X = —633.33 
Thus 

dr, = 13.82261 - .1510063(-633 33) + .0003275019(633.33)2 
“ = 213.17936. 

a = 213.17936 + 112.5 = 325.679. 

But the slope of the Ime will also be different when X = 0. 

To find the slope of the line: 7c = a + 5X + cX2, differentiate with 
respect to X and substitute the desired value of X (in this case, —633.33). 

^ = 5 + 2cX = -.1510063 + 2 (.0003275019) (-633.33) 

aJL 

- -.5658420. 

It is not necessary to find a new value for c, however, as the change in the 
slope is the same throughout. 

Therefore Yc = 325.679 - .5658420X + .0003275019X2. 


Section XXIV-1 

To prove that ai.23 = Xi — 512.3X2 — his 2^3- 
Normal equation I is 

SXi — iV’ai.23 + 5i2.3SX2 + 613 . 2 SX 3 . 

But a; — X — X, and X = x + X. Therefore we may write for normal 
equation I: 

'Zi(xi + Xi) = Nai ,23 + &12 + 613 2 ^ix 3 + X3), or 

Sxi + NXi = Nai,2z + hi2,3(2x2 + NX 2 ) + biz 2(Xzz + iVXs)* 

But Sa; == 0. Thus 

XXi = Xai,23 + 5^.3XX2 + _6 i3 2 NXzf and 
0 ^ 1.23 — Xi — 612.3X2 — 513.2X3. 
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Section XXrV -2 


Proof that 


/ ^12 — ^ 13^23 V 

\vT^vr^ 3 ; = 


'Sx? 


Cl 23 




Cl 3 


Dxf ~ Sa;|i.3 


A demonstration for the other formulae of these t3rpes would proceed 
along similar lines. 


If ri2.3 = 


^12 ^ 13^23 


.( 1 ) 


r?- 


12 3 


ri 2 


- 2 ri 2 rnr 2 z + 'i izr^z 

2 2,2 2 * 
TlZ ?23 + ri 3 r 23 


But rf2 = “ i t ^12 = —====, and similar formulae obtain for the 
SxiSrci VSa:!Sa:| 

other r^s. Therefore: 


ri 2 3 = 


(IiXiX2)^ ^ 

XxiX2 

Xxixz 

^ ^ XX2X3 1 

r{Mxz)^ 

(I>X2Xz)^'] 

MM 

_vmm ^ 

VMMz 

VSalSslJ 

‘^LsxfSil 

n 

W 

n 

< 


(I^XiXs)^ 

(Xx2Xz)^ 

r i'Zxixz)^ 

('Ex 2 Xz)^'‘ 

1 

1 

XxlM 

MM 

Lsx?Sj| 

X 

tot» 

w 

J 


Multiplying numerator and denominator by this simpli- 

fies to the following equation: 

{'ExI)^(XxiX2)^ — 2l!,X%'hXiX2^XiXzIiX2X2 ('^XiXz)^ (Zx2X3)^ 


2 xf 2 xlCExly - 'SxlUxU'SxiXzr 

tIz 3 - 


SxiSa;3(Sa;2X3)^ + (^xix^^{'IiX2X^^ 


23 ~~ Sa^ci 3 
“ Do ;? -241 3 


( 2 ) 


.( 3 ) 


OlzhXiXz — 2^2 ^^ 1^3 — 2^1 • 


But Srcci.3 
Also Xxci 23 = ?>i2.3Sa:ia;2 + 


Now, the normal equations for deriving 612 3 and 613,2 are: 

IL ' 2 xiX 2 = 6i2.32a:| + 613.223:2:3:3; 

III. IjXiXz == 612.3 ^X2Xz + 613 2^2:1- 

In order to solve for 613.2; we may multiply equation II by liX2Xz^ and 
equation III by Sx|, and subtract equation II from equation III. Thus 

II. l!iXiX 2 '^X 2 Xz = 6i2.32x|Sa:2a:3 + 613.2(20:22^3)^ 

III, ^xiXz'SiX^ = 6i2.3So:|So;2o: 3 + 613 2Sof 2a:| 


So:io:32o;| — ^XiX 2 '^X 2 Xz = 6 i 3 . 2 So:|Sz| — 613.2(20:20:3)^ 

'SxiXzSiX^ ” 2 o:ia: 22 a; 2 J :3 
So:| 2 o:| — (ZiX 2 Xz)^ 


613.2 
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In a similar fashion we may solve for 612 3. This involve"s multiplying 
equation II by and equation III by ^2x2X3. By such a process we 
find that 

, _ 1!iXiX3Xx 2X3 — l!iXiX2^xl 

® ~ C2x2X3f - 2xi^xl 

Substituting these expressions for 613 2 and 612.3 in the equation for lixci 23, 
we have 


This simplifies to 


So: 


2 

Cl 23 


('ZxiX3)^'21x^ + (JjXiX 2Y1^X% — 2'hxiX2^XlX3Zx2X3 

{^X 2 X 3 Y 


Now substituting our expressions for Zxci 23 and Sa:ci 3 in formula (3) we 
have 


(IixiX3)^2xi + (2xiX2)^2x^ - 22xiX2XxiX3^X2X3 _ 

2 ^ - (2X2X3)^ 2 x 1 

SJf 

Expanding and simplif3dng, this expression becomes equation (2). There- 
fore 

/ ri2 — ri 3 r 23 V ^ 2 xci 23 ~ 2 xci ,3 
Vv^l'-rfa Vl - riJ ~ Sxf - 2x§i.3 ' 


Section XXrV-3 

To show that iJf 234 = 1 - [(1 - rf4)(l - rfa 4)(1 - rf2.34)]. 

In the expression 

<rii ,0. = (^ ~ 2-f4)(l — ri 3 . 4 )(l — rf 2 34) 

'^ol 2o4fc } 

(1 — ri 4 )(l — ri 3 4)(1 ^ ^12.34) is the proportion of variation that has not 
been explained, that is 

(1 - r! 4 )a - r -?3 4)(1 - r!2 34) = 

This is demonstrated as follows : 

l/i _ ^ 2 ;ci .34 ~ ^Xci 4 
\ Sa^ — 2 x 01,4 

. 2 x 01 234 — 2 xci 34 \ 

2 xf — 2x01 34 ) 
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_ ~ 4\/2a^ — 'SfXcl 4 — "SiXci 34 + ^XCl 4\ 

V M A Sxf- 2x^14 / 

/ Sxf — Stci 34 ~ Xa:ci 234 + Xa;ci 34 '\ 

' Sxj — 2 a;ci 34 ^ 

( 23 :? - 2 a;gi 4)(2a:f - 2 xgi 34) - Xxgi 234) 

'Lx\{^x\ - 2 x§i 4 )( 2 a:f - 23:2i 34 ) 

= ^^1 - Sa^Cl 234 
23:? 

The expression R\ 234 = 1 - [(1 — ri 4 )(l - rfa 4)(1 — ri 2 34 )] is of course 
1 minus the proportion of variation that is unexplained, or the proportion 
of variation that has been explained. 


Section XXIV--4 


To prove that f 14 23 


r!4 2 z{N - m H- 1) - 1 
N — m 


1 


-2 

ri4.23 


ri4.23 


1 - El .234 ^ and 
1 — El 23 

(1 - R\ 23 ) - (1 - .S 1 . 234 ) 

1 - RI 23 


But Rf 234 = 1 - (1 - E!.234) 


Also Ef 23 = 1 - (1 - E!. 23 ) expression, m - 1 


is used instead of m, since R 1.23 involves one less constant in the estimat- 
ing equation than does Ri 234 *) Therefore 
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= 1 ^ 

VI - Ei.23 a N -m ) 

_ 1 _ (iV ~ m + 1) ~ r\4. 2 z{N — m + 1) 

N -m 

— jy "- m ~ iV + ^ ~ 1 + ^14 2z{N — m + 1) 

N — m 

— 23 (A^ -- m + 1) — 1 

N — m 



APPENDIX C 

Aids to Calculation 


Adding machine. Figure 1 shows a standard type of adding machine 
Its operation is very simple. Suppose one wishes to add 132, 356, and 
1072. The procedure is as follows: (1) Press the total key and pull the 



Figure 1. A Hand-Operated Burrougl^ Adding Machine. 

lever in order to clear the machine. This is an important step, for if there 
is a number ^‘in the machine/^ the final answer would be wrong by the 
amount of that number. If the machine has already been cleared, a star 
(without a number to the left of it) will now appear at the top of the 
adding machine sHp. (2) In the last column of keys depress key num* 

m 
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ber 2; in the adjoining row, key number 3; and in the third row from the 
right, key 1. (3) Pull the large lever (or depress the activating key, if 

machine is electric). (4) In similar fashion put 356 on the keyboard 
and pull the lever. (5) Put 1072 on the keyboard and pull the lever. 
(6) Press the total key and pull the lever, thereby obtaining the total 
and clearing the machine. The adding machine slip appears thus: 

* 

132 

356 

1072 

1560* 

(The older style machines require that the lever be pulled again before 
the total key is pressed.) If desired, a subtotal of 132 and 356 may be 
obtained by pressing the subtotal key after operation (4) above and pull- 
ing the handle, with these results: 

* 

132 
356 
488 S 
1072 
1560* 

Most adding machines are now made with subtraction keys, which make 
it possible to deduct values as desired. 

Calculating machine. A machine that will both multiply and divide 
(as well as add and subtract) is shown as Figure 2. 


Figure 2, A Hand-Operated Monroe Calculating Machine. 
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Addition and subtraction. To set the machine for addition or subtrac- 
tion, release the repeat key. To add a number put it on the keyboard the 
same as with an adding machine and turn the handle forward (clockwise) ; 
to subtract it, turn the handle backward (counterclockwise). Turning the 
handle registers the result in the lower dial and clears the keyboard. 

Multiplication, Multiplication may be thought of as repeated addition. 
Thus it would be possible to multiply 54 by 32 on the adding machine 
merely by putting 54 into the adding machine 32 times. Or it can be 
done as shown by the adding machine slip below: 

54 

54 

540 

540 

540 

1728 

The operation is performed in the same fashion by a calculating machine, 
but no printed slip is obtained. The process follows: (1) Set the ma- 
chine for multiplication by pressing the repeat key. (2) Clear the ma- 
chine, so that only zeros appear in the dials and no figures are depressed on 
the keyboard. (3) After the machine is cleared, 54 is put in the key- 
board by depressing keys 5 and 4. (4) The large handle is turned for- 

ward twice. (5) The carriage is shifted once to the right and the handle 
turned forward three times. 54 now appears on the keyboard, 32 in the 
upper dial, and 1728 (the product) in the lower dial. 

A short cut can be introduced when certain numbers are multiplied. 
For instance, to multiply 54 by 29, the easiest way is to multiply 54 by 30, 
and then subtract 54 once by turning the handle backward with the car- 
riage at the extreme left. 

Division, Division, which is merely repeated subtraction, is only slightly 
more complicated, though strictly analogous to long division by hand. 
If 1728 is to be divided by 32, the procedure is as follows: (1) Set the 
carriage several spaces to the right. (2) Set up 1728 in the keyboard 
and turn the handle once forward. 1728 will now appear in the lower dial 
and 1 in the upper. (3) Clear the 1 out of the upper dial by turiaing 
the small handle once forward (or by pressing the ^^clear*^ key and turning 
the large handle once backward). (4) Clear the keyboard. (5) Set up 
32 on the keyboard so that the 3 in 32 will be in the same column of keys 
as the 7 in 1728. (Were the dividend 6728 the 3 would be placed under 
the 6.) (6) Turn the handle backw'ard until the bell rings once, then 

turn it forward once, vrhereupon the bell will again ring. (7) Move the 
carriage one space to the left and repeat step 6, (8) Eepeat step 7 as 
often as necessary. The answer appears in the upper dial. 
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As mentioned, the above method is equivalent to repeated subtraction. 
Thus: 

1728 
~ 32 

1st turn 1408 
~ 32 

2nd turn 1088 

- 32 

3rd turn 768 

- 32 

4tii turn 448 

- 32 

5th turn 128 (Carriage is shifted at this point.) 

- 32 

1st turn 96 

- 32 
2nd turn 64 

- 32 
3rd turn 32 

- 32 
4th turn 0 

The handle having been turned first five times and then four, the answer 
is 54. 

Division by use of reciprocals. When a series of numbers is to be divided 
by the same number, the result can be obtained more easily by multiplying 
each of the numbers in turn by the reciprocal of the divisor. Thus: 

1728 -f- 32 = 1728 X ^ = 1728 X .03125 = 54. 

The most frequent use of this method grows out of the need to express 
each of a series of numbers as a percentage of their total. The numbers 
147, 265, and 376 total 788. In order to state each of these as a percentage 

— = .001269036 

and multiply by 147. The result is .187, or 18.7 per cent. Then, without 
clearing the keyboard the upper dial is made to register 265, and the result 
is .336, or 33.6 per cent. Next, 376 is put in the upper dial, giving .477, 
or 47.7 per cent. (Frequently it will be convenient not to clear either 
the keyboard or the dials, and merely to change the upper dial succes- 
sively. An entire problem is thus solved by this method without clearing 
the machine.) 

Automatic electric machines. Electrically operated calculating machines 
with varying degrees of automatic control, which permit much more rapid 
calculation than do the manually operated types, are also available. Fig- 


J in the keyboard 


of 788, we put the reciprocal of 788 
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ures 3 and 4 illustrate two such machines that have automatic multipli- 
cation and division features. The principal of operation is essentially the 
same as with the manually operated machines, but the operation is easier. 



i 

T- 

Figure 3. An Automatic Electric Marchant Calculating Machine. 



Figure 4. An Automatic Electric Monroe Calculating Machine. 
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To divide, it is necessary only to put the dividend in the middle dial for 
the Marchant or the lower dial for the Monroe and the divasor on the 
keyboard, then actuate the machine by pressing the “divide” key (or lever). 
The quotient appears in the upper dial. In multiplying on the Marchant 
machine, one puts the multiplicand in the keyboard, and figures in the 
right-hand column corresponding to the multiplier are pressed in sequence. 
The product appears in the middle dial. Multiplication on the Monroe 
machine is similar to division, except that it is the “multiply” lever which 
is used and the product appears in the lower dial. 

Marchant, Monroe, and other makes of machines which are less com- 
pletely automatic than those pictured in Figures 3 and 4 but more fully 
equipped than that of Figure 2 may be bought. 



1.5 


Figure 5. A Slide Rule. 


8.25 


Slide rule. Unlike a calculating machine the slide rule is only approxi- 
mately accurate, the accuracy depending on the size of the slide rule, the 
perfection of the materials and workmanship, and the skill of the user. 
The student should not expect to get more than three or four significant 
digits with a lO-inch slide rule. Also the slide rule does not automatically 
locate the decimal; this is ordinarily done very easily by inspection. 

To multiply on a slide rule, set 1 on the slide (scale C) to the multipli- 
cand on the rule (scale D), and opposite the multiplier on the slide (scale C) 
read the product on the rule (scale D). Figure 5 illustrates the multipli- 
cation of 1.5 by 5.5, the result being 8.25. The student will observe that 
a slide rule is merely one or more pairs of logarithmic scales. If he re- 
members that the principle of such scales is that equal distances represent 
equal proportions, he should recognize the principle by which the above 
computation was made. It is that 1 : 5.5 :: 1.5 : 8.25. (The setting of 
the slide rule in Figure 5 illustrates just as well, of course, that 15 X 55 
= 825, or any other multiplication involving these digits.) Figure 6 illus- 
trates the multiplication of 750 by 25, with a result of 18,750. The pro- 
cedure is the same, except that it is necessary to set 10, instead of 1, on 75. 

To divide, set the divisor on the slide (scale C) to the dividend on the 
rule (scale D), and read the quotient on the rule (scale D) opposite 1 or 10 
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on the slide (scale C). Figure 5 illustrates the computation 8.25 4-5 5 
= 1.5, while Figure 6 illustrates 18,750 -4- 25 = 75. 

To square, set the hair line of the cursor on scale D at the number to 
be squared and read the result from scale A at the point indicated by the 
hair line. An antithetical procedure is followed for extracting the square 
root. In Figure 6, the cursor is set to illustrate the squaring of 9, or the 
extraction of the square root of 81. 

Tables. In this volume, use has been made of squares, sums of squares, 
square roots, reciprocals, and logarithms. Appendix 0 gives squares, 



25 1 

16,750 750 


Figure 6. A Slide Rule. 

square roots, and reciprocals for numbers from 1 to 1,000. Appendices 
M and N give the sums of powers of the first 100 natural numbers and of 
the first 100 odd natural numbers. A brief table of logarithms is shown 
as Appendix P. 

More detailed tables of powers, roots, and reciprocals are given in 
Barlow^s Tables, published by Spon and Chamberlain, New York. 

A seven-place table of logarithms is given by James W. Glover in his 
Tables of Applied MathemaUcs in Finance, Insurance, Statistics, published 
by George Wahr, Ann Arbor, Michigan. 

Various useful tables will also be found in R. A. Fisher and F. Yates, 
Statiskcal Tables for Biological, Agricultural, and Medical Research, pub- 
lished by Oliver and Boyd, Ltd., London. 



APPENDIX D 

Ordinates of the Normal Probability Curve 

Erected at Distances ~ from the Mean, Expressed as Decimal Fractions of the 
Maximum Ordinate Yq 

Ni Ni 

The maximum ordinate is computed from the expression To ~ (r\/^ ~ 2.5066cr* 

The values tabled below result from solving the expression 2(X^ 

The proportional height of an ordinate to be erected at any given value on the X axis 
can be read from the table by determming x (the deviation of the given value from the 

mean) and computing Thus if X = $25.00, a = $4.00, Fq “ 1950 and it is desired 

, x $2.00 

to ascertain the height of an ordinate to be erected at $23.00; x = $2.00 and ~ ”^400 

— .50. From the table the ordinate is found to.be .88250 of the maximum ordinate 
Fo or .88250 X 1950 = 172L 


cr 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

00 

1 00000 

99995 

99980 

99955 

.99920 

99 S 75 

99820 

99755 

99685 

99596 

0 1 

99501 

99396 

99283 

99158 

99025 

.98881 

98728 

98565 

98393 

9 S 2 U 

02 

98020 

97819 

97609 

97390 

,97161 

96923 

96676 

96420 

96156 

95882 

0.3 

95600 

95309 

95010 

94702 

94387 

94055 

93723 

93382 

93024 

92677 

04 

.92312 

91939 

91558 

91169 

.90774 

90371 

89961 

89543 

89119 

88688 

05 

.88250 

87805 

87353 

86896 

.86432 

85962 

85488 

85006 

84519 

84060 

06 

83527 

83023 

82514 

82010 

81481 

80957 

80429 

.79896 

79359 

: 78817 

07 

.78270 

77721 

77167 

[ 76610 

7604 S 

75484 

i 74916 

.74342 

73769 

73193 

08 

.72615 

72033 

71448 

70861 

.70272 

69681 

69087 

68493 

.67896 

67298 

09 

66689 

66097 

65494 

64891 

.64287 

63683 

63077 

.62472 

61865 

61259 

10 

.60653 

60047 

69440 

58834 

58228 

57623 

57017 

56414 

56810 

55209 

1 1 

.54607 

54007 

,53409 

52812 

52214 

51620 

51027 

50437 

.49848 

49260 

12 

48675 

.48092 

47511 

46933 

46357 

45783 

46212 

44644 

.44078 

43516 

13 

.42956 

42399 

.41845 

41294 

.40747 

40202 

39661 

39123 

.38569 

.38058 

14 

.37531 

37007 

36487 

35971 

36459 

34950 

34446 

33944 

33447 1 

.32954 

15 

.32465 

31980 

.31500 

.31023 

30550 

30082 

29618 

29158 

.28702 i 

.28251 

16 

.27804 

27361 

.26923 

26489 

.26059 

.25634 

.25213 

24797 

24385 

.23978 

17 

23575 

23176 

22782 

22392 

22008 

21627 

21251 

20879 

20511 

.20148 

18 

.19790 

19436 

.19086 

18741 

18400 

18064 

17732 

.17404 

.17081 

.16762 

19 

,16448 

16137 

15831 

.15530 

15232 

14939 

14650 

14364 

14083 

.13806 

2.0 

.13534 

,13265 

.13000 

.12740 

12483 

.12230 

11981 

11737 

11496 

.11259 

2,1 

.11025 

.10795 

.10570 

.10347 

10129 

09914 

09702 

.09496 

09290 

.09090 

2.2 

.08892 

08698 

08507 

.08320 

08136 

07956 

07778 

07604 

07433 

07265 

23 

.07100 

06939 

.06780 

06624 

06471 

06321 

06174 

.06029 

05888 

.06750 

2.4 

.05614 

.05481 

05350 

05222 

.05096 

.04973 

04852 

04734 

.04618 

,04505 

2,5 

.04394 

.04285 

04179 

.04074 

.03972 

.03873 

03775 

03680 

03586 

.03494 

26 

.03405 

03317 

03232 

03148 

.03066 

.02986 

.02908 

.02831 

02757 

02684 

27 

02612 

.02542 

02474 

02408 

.02343 

02280 

02218 

02157 

.02098 

.02040 

28 

.01984 

01929 

.01876 

.01823 

01772 

,01723 

01674 

.01627 

01581 

01536 

29 

.01492 

01449 

01408 

.01367 

.01328 

.01288 

.01252 

.01215 

.01179 

.01146 

3. 

01111 

.00819 

.00598 

00432 

.00309 

.00219 

00153 

00106 

00073 

00050 

4. 

5. 

.00034 

.00000 

00022 

00015 

.00010 

00006 

.00004 

00003 

,00002 

.00001 

.00001 


Note: After 3.0, ordinates are shown for steps of .l^mstead of .01 
cr cr cr 

From E-ugg^s Statistical Methods Applied to Education^ reprinted by arrangement with the 
publishers, Houghton Mifflin Company. A more detailed table of normal curve ordmates 
may be found in Karl Pearson, Tables for Statisticians and Biomeincmns^ pp. 2-8, The Uni- 
versity Press, Cambridge, England, 1914. The values shown in Pearson’s table should be 
muldplied by V2ir ~ 2.5066 to agree with those shown above. 
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' Areas Under the Normal Probability Curve 

From the Mean to Distances ~ from the Mean, Escpressed as Decimal Fractions oJt th > 

Total Area 1.0000 


The proportional part of the curve included between an ordinate erected at the mean 
and an or^nate erected at any given value on the X axis can be read from the table 

by determining x (the deviation of the given value from the mean) and computing -• 

Thus if X — $25.00, a = $4.00, and it is desired to ascertain the proportion of the 
area under the curve between ordinates erected at the mean and at $20.00; x = $5.00 
^ X $5 00 

and “ = ~ 1.25. From the table it is found that .3944, or 39.44 per cent, of the 

entire area is included. 


X 

cr 

00 

01 

.02 

03 

04 

05 

06 

07 

08 

09 

00 

.0000 

.0040 

.0080 

0120 

0160 

.0199 

0239 

0279 

0319 

.0359 

0 1 

0398 

0438 

0478 

0517 

0557 

.0596 

0636 

.0675 

.0714 

.0753 

02 

0793 

0832 

0871 

1910 

0948 

-0987 

.1026 

.1054 

,1103 

.1141 

03 

1179 

1217 

1255 

1293 

1331 

.1368 

.1406 

.1443 

.1480 

.1517 

04 

1554 

1591 

1628 

,1664 

.1700 

1736 

.1772 

.1808 

1844 

.1879 

05 

1915 

1950 

1985 

.2019 

2054 

.2088 

2123 

.2157 

2190 

.2224 

06 

2257 

.2291 

2324 

2357 

.2389 

.2422 

2454 

.2486 

.2518 

.2549 

0.7 

.2580 

2612 

2642 

2673 

.2704 

2734 

.2764 

2794 

2823 

.2852 

08 

2881 

2910 

2939 

.2967 

.2995 

3023 

3051 

3078 

3106 

.3133 

09 

3159 

3186 

3212 

3238 

3264 

3289 

3315 

3340 

.3365 

.3389 

10 

3413 

3438 

3461 

3485 

3508 

3531 

3554 

3577 

3599 

,3621 

1 1 

3643 

3665 

.3686 

.3708 

3729 

3749 

.3770 

3790 

.3810 

.3830 

1 2 

3849 

3869 

3888 

.3907 





3997 

.4015 

1 3 

4032 

.4049 

.4066 

4082 





4162 

.4177 

1.4 

4192 

.4207 

.4222 

.4236 

.4251 

4265 

4279 

.4292 

4306 

4319 

1.5 

4332 

4345 

4357 

4370 

4382 

4394 

4406 

.4418 

.4429 

.4441 

1.6 

4452 

.4463 

4474 

4484 

4495 

4505 

4515 

4525 

4535 

4545 

1.7 

.4564 

4564 

4573 

.4582 

4591 

4599 

4608 

.4616 

4625 

4633 

1 8 

.4641 

4649 

.4656 

4664 

4671 

4678 

4686 

4693 

4699 

.4706 

1 9 

4713 

4719 

.4726 

.4732 

4738 

4744 

4750 

4756 

.4761 

4767 

20 

.4772 

4778 

4783 

.4788 

4793 

.4798 

4803 

4808 

.4812 

.4817 

2.1 

.4821 ; 

4826 

4830 

.4834 

.4838 

4842 

4846 

.4850 

1 4854 

4857 

2 2 

.4861 

4864 

4868 

4871 

4875 

4878 

.4881 

.4884 

4887 

4890 

23 

4893 

.4896 

.4898 

.4901 

4904 

4906 

4909 

4911 

.4913 

.4916 

2.4 

4918 

4920 

4922 

4925 

4927 

4929 

4931 

4932 

4934 

4936 

25 

4938 

4940 

.4941 

4943 

4945 

.4946 

4948 

4949 

.4951 

4952 

26 

,4953 

4955 

4956 

4957 

4959 

4960 

4961 

4962 

4963 

4964 

2.7 

4965 

.4966 

: 4967 

.4968 

4969 

4970 

4971 

.4972 

.4973 

4974 

2.8 

.4974 

1 .4975 

4976 

.4977 

4977 

4978 

4979 

4979 

4980 

.4981 

29 * 

4981 

.4982 

4982 

1 .4983 

4984 

4984 

4985 

4985 

4986 

4986 

30 

.49865 

4987 

4987 

.4988 

4988 

4989 

4989 

4989 

4990 

.4990 

3 1 

3 2 

33 

34 

3.5 

36 

3.7 

3.8 

3.9 

4.0 

4.5 

5.0 

.49903 

,4993129 

4995166 

4996631 

4997674 

4998409 

4998922 

4999277 

.4999519 

4999683 

4909:, 06 

1 

4991 

4991 

4991 

4992 

.4992 

4992 

4992 

4993 

4993 


From Rugg’s Statistical Methods Applied to Education, reprinted by arrangement with 
the publishers, Houghton MifSia Company. 
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APPENDIX F 

Table of Values of i 

For Given Degrees of Freedom (n) and at Specified Levels of Significance (P) 


In the use of this table it is to be remembered that a level of significance refers to 
both tails of the distribution. Thus, the .02 level (P = .02) includes .01 of the area 
of the curve in each tail It is to be observed that this table is set up in a different 
form from the table of normal curve areas, Appendix E The table of normal curve 


X a; 

areas showed values of - in the margins and proportionate areas from -X” to ^ (one di- 


rection only) in the body A tail of the normal distribution is obtained by subtracting 
this value from 5000 Doubling the resulting figure yields the level of significance 
The I table, on the other hand, shows n (degrees of freedom) m the stub, t in the body 
and P (the level of significance) in the caption. The last row of the t table, for V - oo 
shows t values as obtained from the normal curve. 


Level of Significance (F) 


71 

. 9 | 

.8 


.6 

.5 

.4 

.3 

.2 

i 1 

.05 

.02 

.01 

,001 

1 

1 

.158 

I 

.325 

.510 

.727 

1 . 000 ^ 

1376 

1.963 

3078 

! 

6.314 

12 706 

31.821 

63.657 

636 619 

2 

,142 

.289 

445 

617 

.816 

1061 , 

1386 

1.886 

2 920 ! 

4 303 

6 965 

9 925 

31598 

3 

.137 

.277 

.424 

.584 

765 , 

.978 

1.250 

1.638 

2.353 

3182 

4.541 

5.841 

12 941 

4 

134 

271 

.414 

569 

. 741 ! 

.941 

1190 

1533 

2132 

2.776 

3 747 

4 604 ^ 

8 610 

5 

,132 

267 i 

I 

.408 

559 

.727 

.920 

1.156 

1.476 

2.015 

2.571 

3 365 

4.032 

6 859 

6 

131 * 

265 

.404 

.553 

.718 

.906 

1,134 

1.440 

1.943 

2.447 

3.143 

3 707 

5 959 

7 

.130 

2631 

402 

549 

.711 

896 

1.119 

1 . 415 | 

1895 

2 365 

2 998 

3 499 

5 405 

8 

130 

262 i 

399 

.546 

706 

889 

1.108 

13971 

1860 

2 306 

2 896 

3 355 

5 041 

9 

.129 

261 

398 

.543 

703 

.883 

1 100 

1383 

1.833 

2 262 

2.821 

3250 

4 781 

10 

.129 

.260 

.397 

.542 

.700 

CO 

1.093 

1372 

1812 

2 228 

2 764 

3.169 

4 587 

11 

129 

260 

396 

.540 

.697 

876 

1088 

1.363 

1796 

2201 

2 718 

3.106 

4 437 

12 

128 

.259 

.395 

539 

695 

.873 

1083 

1.356 

1.782 

2179 

! 2 681 

3.055 

4.318 

13 

128 

.259 

.394 

.538 

694 

.870 

1079 

1350 

1771 

2160 

: 2.650 

3.012 

4.221 

14 

.128 

.258 

393 

537 

692 

.868 

1.076 

1.345 

1.761 

2.145 

2.624 

2 977 

4140 

15 

.128 

,258 

393 

.536 

691 

866 

1.074 

1341 

1753 

2131 

2.602 

2.947 

4 073 

16 

128 

258 

.392 

535 

690 

.865 

1.071 

1.337 

1746 

2.120 

2 583 

2.921 

1 4 015 

17 

.128 

257 

.392 

.534 

689 

.863 

1 1.069 

1333 

1740 

2.110 

2 567 

2.898 

I 3 965 

18 

127 

.257 

.392 

.534 

688 

.862 

1.067 

1330 

1.734 

2101 

2.552 

2878 

3 922 

19 

.127 

257 

.391 

.533 

.688 

.861 

1.066 

1.328 

1.729 

2.093 

2.539 

2.861 

3 883 

20 

.127 

.257 

391 

.533 

.687 

.860 

1.064 

1.325 

1.725 

2.086 

2.528 

2845 

3.850 

21 

. 127 ’ 

.257 

391 

.532 

.686 

.859 

1.063 

1323 

1.721 

2080 

2 518 

2 . 8 S 1 

3819 

22 

. 127 ; 

.256 

.390 

.532 

.686 

.858 

1.061 

1.321 

1.717 

2 074 

2 508 

2.819 

3.792 

23 

.127 

.256 

.390 

.532 

685 

858 

1 060 ' 

1310 

iru 

2069 

2.500 

2.807 

3 767 

24 

.127 

.256 

390 

.531 

.685 

857 

, 105 <i 1318 1 711 

2 064 

2 492 

2 797 

3745 

25 

.127 

J 256 

.390 

.531 

.684 

.856 

1.058 

1.316 

1 

2.060 

2.485 

2.787 

3725 

26 

. 127 ^ 

.256 

.390 

.531 

.684 

.856 

1058 

1315 

1.706 

2.056 

2.479 

2.779 

3707 

27 

127 j 

.256 

.389 

.631 

.684 

.855 

1057 

1.314 

1.703 

2 052 

2.473 

2.771 

1 3 690 

28 

127 

256 

.389 

530 

.683 

.855 

1056 

1313 

1701 

2.048 

2 467 

2.763 

3.674 

29 

.127 

.256 

.389 

.530 

.683 

.854 

1055 

1311 

1.699 

2.045 

2.462 

2.756 

3659 

30 

.127 

.256 

.389 

.530 

.683 

,854 

1.055 

1.310 

1.697 

2.042 

2.457 

2.750 

1 3.646 

40 

126 

.255 

388 

.629 

.681 

.851 

1050 

1.303 

1.684 

2021 

2423 

2.704 

3.551 

60 

126 

.254 

.387 

.627 

.679 

.848 

1046 

1296 

1671 

2 000 

2 390 

2.660 

3.460 

120 

126 

251 

386 

526 

.677 

.845 

1041 

1.289 

1.658 

1.980 

2.358 

2.617 

3.373 

00 


i = 53 | 

385 

1 5211 

.674 

.842 

1.036 

1.282 

1 

1.645 

1.960 

2.326 

2.576 

3.291 


This table is taken by consent from Statistical Tables for Biological, Agricidtural, and 
Medical Research, by Prof. R. A. Fisher and F. Yates, published by Oliver and Boyd, 
Edinburgh. A table of t, similar in arrangement to that of Appendix E, giving areas of 
the t distribution from the mean to i (in one direction) and forn = 1 to = 20 may be 
found m “New Tables for Testing the Significance of Observations/^ by “Student,” 
Metron. Vol. V. No. 3 (1925), pages 114^118. 
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APPENDIX G1 

Values of z at the .05, .01, and .001 Points of the Distribution of z for Specified Values of iti and /ig 



S76 




877 


Based on tables in R. A. Fisher, Stahsti^cal Methods for Research Workers, by permission of Piofessor Fisher and Oliver and Boyd, Ltd Values of g 
at the .20 pomt are shown m R A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural and Medical Research,^. 28, Oliver and Boyd, Ltd , 
Edinburgh, 1938. 

In the above table the values of z at the 001 point for ni = 1, m — 1 and for ni = 12, 71^ — 2 have been corrected to agree with those given in 
Fisher and Yates, op dt, p. 34. 
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APPENDIX H 

Values of L at the .05 and .01 Levels of Significance for the Distribution 
of L for Specified Values of N and k, when Ni = = • • • = N'jc = N 

iVi + iV2 + ■ ■ + Nt 

If L has been computed from samples of varying size, take N — r 


provided that no sample N is less than 15 or 20. 



N ■■ 

= 3 

N' ■■ 

= 4 

N ■■ 

= 5 

AT = 6 

iV = 7 

N ■■ 

= 8 

A" : 

= 9 

/c 

05 

.01 

06 

.01 

.05 

.01 

05 

.01 

05 

.01 

05 

01 

,05 

.01 

2 

312 

.141 

478 1 

.284 

585 

398 

656 

.485 

708 

.551 

745 

603 

775 

.645 

3 

304 

.162 

470 

.314 

576 

.429 

648 

.514 

700 

.678 

739 

628 

769 

.667 

4 

315 

.188 

480 

.345 

585 

.459 

656 

.542 

707 

604 

744 

.652 

774 

689 

5 

328 

.210 

491 

.370 

595 

.484 

665 

.565 

714 

.624 

751 

.670 

780 

.706 

6 

339 1 

230 

502 

.391 

604 

.504 

673 

.583 

721 

.641 

757 

685 

785 

720 

7 

350 

.246 

512 

.409 

612 

.520 

680 

.697 

727 

.654 

763 

697 

790 

.730 

8 

359 

.260 

520 

424 

620 

534 

686 

610 

733 

.665 

768 

707 

795 

.740 

9 

367 

.273 

527 

.437 

626 

545 

691 

.620 

.738 

.674 

,772 

716 

798 

.747 

10 

374 

284 

534 1 

.448 

631 

.555 

696 

629 

742 

.682 

776 

.722 

802 ; 

.753 

12 

387 

.303 

545 1 

467 

.641 

672 

704 

.644 

749 

.696 

782 

734 

807 

.764 

14 

397 

318 

554 j 

.481 

649 

.585 

711 

.665 

755 

.706 

787 

744 

812 

.773 

16 

405 

331 

561 ’ 

493 

655 

.696 

716 

.665 

759 

.714 

791 

.761 

S16 

.779 

18 

412 

342 

567 

.504 

660 

.605 

721 

.672 

763 

.721 

795 

.756 

819 

.784 

20 

418 

.352 

573 

.512 

665 

.613 

725 

.679 

767 

.727 

798 

.761 

.822 

.788 

22 

424 

360 

577 

.520 

669 

.619 

728 

.684 

770 

.732 

800 

.766 

824 

.792 

24 

428 

367 

581 

526 

672 

.624 

731 

688 

772 

.736 

802 

.768 

826 

.795 

26 

483 

.373 

585 

632 

675 

.629 

734 

.693 

775 

.740 

805 

.772 

828 

.798 

28 

437 

.379 

589 

.637 

678 

.634 

736 

697 

777 

.744 

807 

776 

829 

.802 

30 

441 

.386 

592 

543 

681 

.639 

739 

.703 

.779 

.748 

809 

.781 

831 

.806 


h 

N = 

= 10 

N = 

= 12 

N = 

= 15 

iV” = 20 

iV = 30 

o 

o 

II 

AT « 00 


05 

.01 

05 

.01 

j 05 

.01 

05 

.01 

05 

.01 

.05 

.01 

.05 

.01 

2 

798 

.678 

833 

730 

868 

.783 

902 

.836 

935 

.890 

968 

1 945 

1.000 

1.000 

3 

.792 

699 

.828 

.748 

S63 

.798 

.898 

.848 

933 

.898 

967 

.949 

1 000 

1.000 

4 

797 

719 

832 

.765 

866 

.812 

900 

! .859 

934 

1 .906 

967 

t .953 

lOOO 

1.000 

5 

.802 

.736 

836 

779 

.870 

.823 

.903 

! .867 

93C 

,911 

968 

.956 

lOOO 

1.000 

6 

808 

.748 

.841 

.789 

873 

.832 

906 

.874 

938 

.916 

969 

.958 

1000 

1.000 

7 

812 

.757 

844 

.798 

876 

.839 

908 

! .879 

.939 

.920 

970 

.960 

lOOO 

1 ooo 

8 

816 

.766 

848 

.806 

.879 

.844 

910 

.884 

941 

.923 

971 

.962 

1000 

1.000 

9 

819 

.773 

851 

811 

SSI 

.849 

912 

887 

942 

.925 

971 

.963 

lOOO 

1.000 

10 

822 

.779 

853 

.816 

883 

.853 

913 

.890 

.943 

.927 

972 

.964 

1000 

lOOO 

12 

828 

.789 

857 ? 

.824 

887 

.860 

916 

.896 

944 

.931 

.973 

.966 

lOOO 

1.000 

14 

S32 i 

,796 

861 

.831 

890 

.865 

918 

.900 

946 

.933 

973 

.967 

lOOO 

1,000 

16 

,835 

.802 

863 

.836 

892 

870 

920 

903 

947 

.936 

974 

.968 

lOOO 

100(5 

18 

838 

807 

.866 

.840 

894 

.873 

921 

.905 

948 

,937 

974 

.969 

lOOO 

1.000 

20 

840 

.811 

868 

.844 

896 

.876 

922 

.903 

.949 

.939 

975 

.970 

1000 

1.000 

22 

843 

.814 

870 

.847 

897 

.878 

.924 

.909 

950 

.940 

976 

.970 

1000 

1.000 

24 

844 

.817 

872 

.850 

898 

.880 

924 

911 

950 

.941 

975 

.971 

1000 

1.000 

26 

846 

.820 

873 

.852 

.809 

.882 

925 

.912 

.951 

.942 

.976 

.971 

1.000 1 

1.000 

28 

848 

.823 

874 

.854 

900 

.884 

936 ! 

.914 

951 

.943 

.976 

.972 

lOOO I 

1.000 

ao 

849 

.827 

876 

.856 

901 

.886 

.927 

.915 

,962 

.944 

.976 

.972 

1000 i 

1 

1.000 


Based on a table in “An Investigation Into the Application of Neyman and Pearson’s hi 
Test, with Tables of Percentage Limits,” by P. P. N. Nayer, Statistical Research Memoirs^ 
Vol. I (1936), pp. 38-51, by permission of the author. An earlier table of the same nature 
is given in “Tables for the Application of L-Tests,” by P, C. Mahalanobis, Sanhhya: The 
hidian Journal of Statistics ^ Vol. I, Part 1 (June 1933), pp. 109-122. 
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APPENDIX I 

Values of 

For Given Degrees of Freedom (n) and for Specified Values of P 


Value of P 



.99 

.98 

95 

90 

50 

70 

50 

30 

50 

10 

05 

02 

01 

OOU. 

1 

000157 

.00062S 

.00393 

.0158 

.0642 

.148 

.455 

1074 

1642 

2.706 

3 841 

5 412 

6635 

10 827 

2 

.0201 

.0404 

.103 

.211 

.446 

.713 

1386 

2408 

3 219 

4 605 


7 1 

r •> “1 


3 

116 

.185 

.352 

584 

1.005 

1524 

2566 

3 665 

4 642 

6,251 

'' 1 

* s,," 

1 ' 

' 'll 1> V 

4 

.297 

.429 

711 

1064 

1649 

2195 

3357 

4 878 

5 989 

7 779 
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.654 

.762 

1 145 

1610 

2 343 

3000 

4 351 

6 064 

7 289 

9 236 

11070 

13 388 

15 086 

20 617 

6 

.872 

1 134 

1635 

2504 

3070 

3828 

5348 

7231 

£ 
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1 239 

1564 

2 167 

2833 

3522 

4 671 

6 346 

8383 

£■« . 
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1 646 

2032 

2 733 

3490 

4594 

5 527 

7344 

9524 

11 ■ 
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. 1 . 

-* 0 

2 2‘» 

8 

2088 

2532 

3 325 

4168 

6380 

6593 

8543 

10 656 

1 

, ■. 



2 

2“ '>77 

10 

2.658 

3059 

3.940 

4565 

6.179 

7267 

9 342 

11781 

J - 

"" 

> " 

- 

' 

'2 . ■'S 

11 

3053 

3609 

4.575 

5578 

6.989 

ai48 

10341 

12899 

14 631 

17575 

19675 

22 618 

24 725 

31 264 

12 

3 671 

4 178 

5226 

6504 

7807 

9034 

11340 

14 011 

15 812 

18 549 

21026 

24 054 

26 217 

32 909 

13 

4.107 

4 765 

5 892 

7042 

8634 

9926 

12 340 

15119 

16 985 

19 812 

22 362 

25 472 

27 688 

34 528 

14 

4.660 

5 368 

6 571 

7790 

9467 

10821 

13 339 

16 222 

18 151 

21 064 

23 685 

26S73 

29 141 

36 123 

16 

5.229 

5 985 

7.261 

8547 

10507 

11721 

14 339 

17522 

19 311 

22 307 

24 996 

28 259 

30 578 

37 697 

16 

5 812 

6614 

7962 

9512 

11152 

12624 

15538 

18418 

20 4C5 

23 542 

26 296 

29 033 

32 000 

39 252 

17 

6 408 

7.255 

8.672 

10085 

12.002 

13531 

16538 

19611 

21 615 

24 769 

27 587 

30 995 

33 409 

40 790 

18 

7 015 

7 906 

9.390 

10 865 

'12 857 

14440 

17338 

20C01 

22 760 

25 989 

28869 

32 346 

34 805 

42 312 

19 

7 633 

8.567 

10117 

11651 

13.716 

15552 

18338 

21 689 

23 900 

27 204 

30 144 

33 687 

36 191 

43 820 

20 

8260 

9 237 

10.851 

12.443 

14 578 

16266 

19537 

22775 

25 038 

28 412 

31410 

35 020 

37 666 

45515 

21 

SJB&T 

9915 

11591 

13540 

15.445 

17 182 

20 337 

23858 

26171 

29 615 

32 671 

36 343 

38 932 

46.797 

22 

9 642 

10600 

12.33S 

14 041 

16514 

ini 

21 

04 MO 

OT 001 


33 924 

37 659 

40 289 

48 268 

23 

10 196 

11.293 

13 091 

14548 

17 187 

I O'l 

22 >. 

- '' 1 

■ « 

■_ 1 

35172 

38 968 

41638 

49 728 

24 

10 866 

11992 

13 84S 

15 659 

I&062 

19 943 

23537 ' 
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,, , 

r '■ 1 ' 

1 27 

■> .n 

. ' 1 ■» 

26 

11.524 

I2i697 

14.611 

16473 

18940 

20567 

24 337 

1 


0 J 

I 

.r 

1 


‘ j ' I 

.-2 iiJ-l 

26 

12 198 

18.409 

15579 

17592 

19520 

21792 

25536 

29.246 

31 795 

35563 1 

38885 

42 856 1 

45642 

54 052 

27 

12 879 

14.125 

16 151 

1SU4 

20703 

22 7 0 

.. 70 ' 

r>*.“ ■» 

.*2 2 




. * - 3 ; 

■■ ■71', 

28 

13 665 

14347 

18 928 

18 939 

21 588 1 

2 fiT 

< ‘V 

1 •>- 

>, 

, - 

' 

‘.7e» i 


29 

14.256 

15.574 

17708 

19768 

22 475 

I. ~“7 


po 1 


r - 

... 

► > 



30 

14.953 

16J06 

18.493 

20.599 

23364 

25 508 1 

29 336 

33 530 1 

36 250 1 

40 266 1 

43773 1 

47 962 1 

50 892 1 

59 703 


For large values of n co mpute ^ /2x^ tlie distribution of which is approximately 
normal around a mean of \^2n — 1 wither »= 1. P is the ratio of one tail of the normal 
distribution to the area under the entire curve. 

This table is taken by consent from Statistical Tables for Biologicaly AgricvlturoHy and 
MedwdResearchy by Prof. R. A. Fisher and F. Yates, published @ 12/6 by Oliver and 
Boyd, Edmburgh. 

A detailed table of the probability of various values of for one degree of freedom is 
given in G. U. Yule and M. G. Kendall, An Introduction, to the Theory of Statzstics, 11th 
edition, pp. 534r“535, Charles Griffin and Co., London, 1937, 
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RELATIVE HEIGHT 
OF ORDINATE 



VALUE OF 


RELATIVE HEIGHT 
OF ORDINATE 



Distribution of ^ ^ = 5, n « 9, and n= 17. The maximum 

ordinate is at x^ ~ ^ 2 except when n ~ 1. When n 1, the max- 

imum ordinate is at x® — 0. When n = 1, there is 4,55 per cent of the 
curve beyond x^ = 4- Beyond x® = 30 there is .0015 of one per cent 
of the curve when n =» 5; 0439 of one per cent of the curve when 7^ = 9; 
2.6345 per cent of the curve when n = 17. The two charts have been 
drawn to different scales. If the vertical axis of the upper chart is ex- 
panded to approximately 20 times its length and the horizontal axis is 
contracted to about one-eighth of its length, the curves will be roughly 
comparable as to area. 
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APPENDIX J 


Ya 


o'V'2Tr 


Values of F2 

For Use in Fitting Curves of the Type 



cr 

.00 

01 

.02 

03 

04 

05 

.06 

07 

08 

.09 

.0 

00000 

00001 

00004 

00009 

00016 

.00025 

00036 

00049 

.00064 

00081 

.1 

00099 

00120 

.00143 

00167 

.00194 

00222 

.00253 

00285 

00319 

.00355 

.2 

.00392 

00432 

00473 

.00616 

00561 

00607 

00656 

00705 

.00757 

00810 

.3 

00865 

00921 

00979 

01038 

01099 

.01161 

01225 

.01290 

01356 

01424 

.4 

01493 

01564 

01635 

01708 

01782 

01857 

01933 

02011 

.02089 

02168 

.5 

02248 

02329 

02411 

.02494 

02578 

02662 

02748 

.02833 

.02920 

.03007 

.6 

03095 

03183 

03272 

03361 

03450 

.03540 

03631 

.03721 

03812 

.03904 

.7 

03995 

.04086 

04178 

04270 

04362 

04453 

.04545 

.04637 

.04728 

04820 

.8 

04911 

05002 

05093 

05183 

05274 

05363 

05453 

05542 

05031 

05719 

9 

05806 

.05894 

.05980 

.06066 

0C152 

06236 

06320 

06404 

.06486 

.06568 

1 0 

06649 

06729 

06809 

06887 

06965 

07042 

07118 

07193 

07267 

.07340 

1 1 

07412 

.07483 

07552 

07621 

07689 

07756 

07S22 

07886 

07950 

.08012 

1 2 

08073 

0S133 

08192 

08250 

08306 

08361 

0S416 

08468 

08520 

08571 

1.3 

08620 

08668 

08715 

08760 

08805 

08848 

08890 

08930 

08970 

09008 

14 

09045 

0908U 

09115 

09148 

09180 

09211 

09241 

.09269 

.09296 

.09322 

15 

09347 

09371 

09394 

09415 

.09435 

09454 

09472 

09489 

09506 

09519 

1 6 

09533 

09546 

09657 

09567 

09577 

09585 

09592 

09599 

.09604 

.09608 

1 7 

09612 

09614 

09616 

09616 

09616 

09615 

09613 

09610 

09606 

09602 

1 8 

.09597 

09590 

09584 

09576 

09568 

09559 

09549 

.09539 

09527 

09516 

19 

09503 

09490 

.09477 

.09463 

09448 

09433 

09417 

09401 

09384 

.09366 

20 

.09349 

.09330 

09312 

.09293 

09273 

09253 

09233 

.09213 

09192 

.09170 

2 1 

.09149 

09127 

09106 

090S2 

09060 

09037 

09014 

.08991 

08967 

.08943 

22 

.08919 

08895 

08871 

.08847 

.08823 

08798 

08774 

08749 

08724 

.08699 

2 3 

08674 

08650 

08625 

08600 

.08575 

08550 

08525 

08500 

.08476 

08450 

24 

.08426 

08401 

.08376 

08352 

08327 

.08303 

.08279 

08256 

.08231 

08207 

25 

.08183 

08159 

i .08136 

08112 

08089 

08066 

.08043 

08020 

.07998 

.07975 

26 

.07953 

07931 

07909 

07SS8 

.07866 

.07845 

07824 

.07803 

07782 

.07762 

27 

07742 

.07722 

07702 

07682 

07663 

.07644 

.07625 

.07606 

07588 

.07669 

28 

.07551 

.07534 

07516 

07499 

07482 

.07465 

.07448 

.07432 

.07416 

.07400 

29 

3.0 

31 

32 

33 

34 

35 

36 

37 

38 

39 

4.0 

07384 

07240 

07118 

07016 

06933 

06866 

06813 

06771 

.06739 

06714 

06696 

06683 

.07369 

.07354 

07339 

.07324 

.07309 

.0r295 

.07281 

.07267 

.07254 


From W- A. Shewhart, Economic Control of Quality of Manufactured Product^ p. 91, 33. 
Van Nostrand Company, Inc., New York, 1931. Courtesy of D. Van Nostrand Company, 
Inc. and The Bell Telephone iaboratories. 


For values of P% (;) beyond the range shown above, use the expression Fa 

-[-0 


6 ^ 2 ^ 


e2<r= I = 


1. '5.036 



r /xVi 

14 

1~ 


e2<r2 

J 

. i 

L N / — 



The values of e 


may be conveniently read from the table of ordinates of the normal curve, Appendix I>, ot 
from a more extensive table in Karl Pearson, Tables for Statisticians and Biomeiricians, pp, 2—8, 
The University Press, Cambridge, England, 1914. The values for 2 shown m the latter 
=:£! 

table yield e when multiplied by 2.5066. 
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* Leap Year, February has 29 days 1 1900 was not a Leap Year / Good Friday occurred in March e Easter occurred in March. 



APPENDIX L 

Brief Table of Sines and Cosines 

To find tbe sine or cosine of any angle greater than 90°, subtract the angle from some 
integral multiple of 180° and use the table below. Signs to be prefixed to table values 
are as follows; 


Angle 

{degrees) 

Sin 

Cos 

0-90 

+ 

4- 

90-180 

+ 

— 

180-270 

— 

— 

270-360 

— 



When using this table to determine correlation by Sheppard’s method of Unlike Signs 
(Cos U 1.8°), values of the coefficient are negative when U 1.8° exceeds 90°. 


Degree 

Sine 

Cosine 

Degree 

Sine 

Cosine 

Degree 

Sine 

Cosine 

0 

.0000 

1.0000 

30 

.5000 

.8660 

60 

.8660 

.5000 

1 

.0175 

.9998 

31 

.5150 

.8572 

61 

.8746 

.4848 

2 

.0349 

.9994 

32 

.5299 

.8480 

62 

.8829 

.4695 

3 

.0523 

.9986 

33 

.5446 

.8387 

63 

.8910 

.4540 

4 

.0698 

.9976 

34 

.5692 

.8290 

64 

.8988 

.4384 

5 

.0872 

.9962 

35 

.5736 

.8192 

65 

.9063 

.4226 

6 

.1045 

.9945 

36 

.5878 

.8090 

66 

.9135 

.4067 

7 

.1219 

.9925 

37 

6018 

.7986 

67 

.9205 

.3907 

8 

.1392 

.9903 

38 

.6157 

.7880 

68 

.9272 

.3746 

9 

.1564 

.9877 

39 

.6293 

.7771 

69 

.9336 

.3584 

10 

.1736 

.9848 

40 

,6428 

.7660 

70 

.9397 

.3420 

11 

.1908 

.9816 

41 

.6561 

,7547 

71 

.9455 

.3256 

12 

.2079 

.9781 

42 

.6691 

.7431 

72 

.9511 

3090 

13 

.2250 

.9744 

43 

.6820 

.7314 

73 

.9563 

2924 

14 

.2419 

.9703 

44 

.6947 

.7193 

74 

.9618 

.2756 

15 

.2588 

.9659 

45 

.7071 

.7071 

75 

.9659 

.2588 

16 

: .2756 

.9613 

46 

.7193 

.6947 

76 

.9703 

.2419 

17 

.2924 

.9563 

47 

.7314 1 

.6820 

77 

.9744 ' 

.2250 

18 

.3090 

.9511 

48 

.7431 

.6691 

78 

.9781 ^ 

.2079 

19 

.3256 

.9455 

49 

.7547 

.6561 

79 

.9816 

.1908 

20 

.3420 

.9397 

50 

.7660 

.6428 

80 

.9848 

.1736 

21 

.3584 

.9336 

51 

.7771 

.6293 

81 

.9877 

.1564 

22 

.3746 

,9272 

52 

.7880 

.6157 

82 

.9903 

.1392 

23 

.3907 

.9205 

53 

.7986 

.6018 

83 

.9925 

.1219 

24 

.4067 

.9135 

54 

.8090 

.5878 

84 

.9945 

.1045 

25 

.4226 

,9063 

55 

.8192 

.5736 

85 

.9962 

.0872 

26 

.4384 

.8988 

56 

.8290 

.5592 

86 

.9976 

.0698 

27 

.4.540 

.8910 

57 

.8387 

.5446 

87 

.9986 

.0523 

28 

.4695 

.8829 

58 

.8480 

.5299 

88 

.9994 

.0349 

29 

.4848 

.8746 

59 

.8572 

.5150 89 

90 

1 .9998 
1.0000 

.0175 

.0000 
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APPENDIX M 

Sums of First Six Powers of First SO NaturaLNumbers 

A table of the -pums of the squares of the first 100 natural numbers may be found in 
Croxton and Cowden, Practical Business StatisticSf page 494 A table of the sums of 
the first 7 powers of the first 100 natural numbers may be found in Pearson, Tables for 
Statisticians and Biometricians^ pages 40-41. The sums of the powers of the first M 
natural numbers may also be computed by the following formulae: 

- 3Af + I Ngy, 


M IS the highest value of X used m the computation table When the X origin is taken at the center 
of the X values, it is necessarj’- to multiply the summation value of this table by 2 In this case N as 
used in the normal equations is 2M + 1 


ilf 

M 

XX 

1 

M 

1 

M 

XX3 

1 

M 

1 

M 

XX^ 

1 

M 

1 

1 

1 

1 

1 

1 

1 

1 ' 

2 

3 

5 

9 

17 

33 

65 

3 

6 

14 

36 

98 

276 

794 

4 

10 

30 

100 

351 

1 300 

4 890 

5 

15 

55 

225 

979 

4 425 

20 515 

C 

21 

91 

441 

2 275 

12 201 

67 171 

7 

28 

140 

784 

4 676 

29 uOS 

184 820 

S 

36 

204 

1 296 

8 772 

61 776 

446 964 

9 

45 

285 

2 025 

15 333 

120 825 

978 405 

10 

55 

385 

3 025 

25 333 

220 825 

1 978 406 

11 

66 

506 

4 356 

39 974 

381 874 

3 749 966 

12 

78 

650 

6 084 

60 710 

630 70S 

6 735 950 

13 

91 

819 

8 281 

89 271 

1 002 001 

11 662 759 

14 

105 

1 015 

11 025 

127 687 

1 539 825 

19 092 295 

15 

120 

1 240 

14 400 

178 312 

2 299 200 

30 482 920 

16 

136 

1 496 ' 

18 496 

234 848 

3 347 776 

47 260 136 

17 

153 

1 785 

23 409 

327 369 

4 767 633 

71 397 705 

18 

171 

2 109 

29 241 

432 345 

6 657 201 

105 409 929 

19 

190 

2 470 

36 100 

562 666 

9 133 300 

152 455 810 

20 

210 

2 870 

44 100 

722 6C6 

12 333 300 

216 455 810 

21 

231 

3 311 

53 361 

917 147 

16 417 401 

302 221 931 

22 

253 

3 795 

64 009 

1 151 403 

21 571 033 

415 601 835 

23 

276 

4 324 

76 176 

1 431 244 

28 007 376 

563 037 724 

24 

300 

4 900 

90 000 

1 763 020 

35 970 000 

754 740 700 

25 

325 

5 525 

105 625 

2 153 645 

45 735 625 

998 881 325 

26 

351 

6 201 

123 201 

2 610 621 

57 617 001 

1 307 797 101 

27 

378 

6 930 

1 142 884 

3 142 062 

71 965 908 

1 695 217 590 

28 

406 

7 714 

i 164 836 

3 756 718 

89 176 276 

2 177 107 894 

29 

435 

8 555 

189 225 

4 463 999 

109 687 425 

2 771 931 216 

30 

465 

9 455 

216 225 

5 273 999 

133 987 425 

3 500 931 215 

31 

496 

10 416 

246 016 

6 197 620 

162 616 576 

4 388 434 896 

32 

528 

11 440 

278 784 

7 246 096 

196 171 008 

5 462 176 720 

33 

561 

12 529 

314 721 

8 432 017 

235 306 401 

6 753 644 689 

34 

695 

13 685 

354 025 

9 768 353 

280 741 825 

8 298 449 105 

35 

630 

14 910 

396 900 

11 268 978 

333 263 700 

10 136 714 730 

36 

666 

16 206 

443 556 

12 948 594 

393 729 876 

12 313 497 066 

37 

703 

17 575 

494 209 ; 

14 822 755 

463 073 833 

14 879 223 475 

38 

741 ’ 

19 019 

549 081 ' 

16 907 891 

542 309 001 

17 890 159 859 

39 

780 

20 640 

608 400 

19 221 332 

632 533 200 

21 408 903 620 

40 

820 

22 140 

672 400 

21 781 332 

734 933 200 

25 504 903 620 

41 

861 

23 821 

741 321 

24 607 093 

8f0 789 401 

30 255 007 861 

42 

903 

25 585 

815 409 

27 718 789 

981 480 633 

35 744 039 605 

43 

946 

27 434 

894 916 

31 137 590 

1 128 489 076 

42 065 402 664 

44 

990 

29 370 

980 100 

34 885 686 

1 293 405 300 

49 321 716 610 

45 

1 035 

31 395 

1 071 225 

38 986 311 

1 477 933 425 

57 625 482 135 

46 

1 081 

33 611 

1 168 561 

43 ^63 767 

1 683 896 401 

67 099 779 033 

47 

1 128 

35 720 

1 272 384 

48 343 448 

1 913 241 408 

77 878 994 360 

48 

1 176 

38 024 

1 382 976 

53 851 864 

2 168 045 376 

90 109 584 824 

49 

1 225 

40 425 

1 500 625 

59 416 665 

2 450 520 625 

103 960 872 025 

60 

1 275 

42 925 

1 626 625 

65 666 665 

2 763 020 625 

119 575 872 026 
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XX 

1 
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1 

M 

1 
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APPENDIX N 

Sums of the First Six Powers of the First 50 Odd Natural Numbers 

A table of the sums of the squares of the first 100 odd natural numbers 
may be found in Croxton and Cowden, Practical Business Statistics^ page 
495. A table of the sums of the first 6 powers of the first 100 odd natural 
numbers is given in Ross ^Tormulae for Facilitating Computations in 
Time Series Analysis/^ Journal of the American Statistical Association, 
March 1925j pages 75-79. 

The sums of the powers of the first Mo odd natural numbers may be 
computed by the following formulae: 


Mo 

- MS 


1 \ 6 / 1 





im* - 20il/g + 

3 ' \ ^ 


Mo Mo 

xx% = ml - i)sxo 


1 


1 


- 723/g -f 31 



M IS the lughest value of Xo (odd value of X) used in the computation table Mo may be ascertained 
oy reference to the first two columns of this appendix or from the expression Mo = — CA - l . When the 
X origin is taken at the center of the Xo values, it is necessary to multiply the summation value of thif 
table by 2. In this case N as used in the normal equation is 2il!fo. 
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2 
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6 
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6 
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APPENDIX 0 

Squares, Square Roots, and Reciprocals 

1-1000 


No 

Square 

Square Root 

Reciprocal 


No. 

Square 

Square Root 

Reciprocal 

1 

1 

1.0000000 

1.000000000 


51 

26 01 

7.1414284 

.019607843 

2 

4 

1.4142136 

0.500000000 


52 

27 04 

7-2111026 

019230769 

3 

9 

1.7320508 

.333333333 


53 

28 09 

7.2801099 

.018867925 

4 

16 

2.0000000 

.250000000 


54 

2916 

7.3484692 

.018518519 

5 

25 

2.23C0680 

.200000000 


55 

30 25 

7.4161985 

.018181818 

6 

36 

2.4494897 

.166660667 


56 

31*36 

7.4833148 

.017857143 

7 

49 

2.6457513 

.142857143 


57 

32 49 

7.5498344 

.017543860 

8 

64 

2 8284271 

.125000000 


-58 

33 64 

7.6157731 

.017241379 

9 

81 

3.0000000 

.111111111 


59 

34 81 

7.6811457 

,016949153 

10 

100 

3.16227?7 

.100000000 


60 

36 00 

7.7459867 

.016666667 

11 

121 

3 3166248 

.090909091 


61 

37 21 

7.8102497 

.010393443 

12 

144 

3.4641016 

.083333333 


62 

3$ 44 

7.8740079 

.016129032 

13 

169 

3.6055513 

.076923077 


63 

39 69 

7.9372539 

.015873016 

14 

1 96 

3.7416574 

.071428571 


64 

40 96 

8 OOOOOOO 

.015625000 

15 

2 25 

3.8729833 

.066666607 


65 

42 25 

8.0622577 

.015384615 

16 

2 56 

4.0000000 

.062500000 


66 

43 56 

8 1240384 

.015151515 

17 

2 89 

4.1231056 

.058823529 


67 

44 89 

8.1853528 

.014925373 

18 

324 

4.2426407 

.055555556 


CS 

46 24 

8.2462113 

.014705882 

19 

3 61 

4.3588989 

.052631579 


69 

47 61 

8 3060239 

.014492754 

20 

4 00 

4.4721360 

.050000000 


70 

49 00 

8.3G06003 

.014285714 

21 

441 

4.5825757 

.047619048 


71 

50 41 

8.4201498 

-0140S4507 

22 

4 84 

4 6904158 

.045454545 


72 

51 84 

8.4852814 

0138SSS89 

23 

6 29 

4.7958315 

.043478261 


73 

53 20 

8.5440037 

.013698030 

24 

5 76 

4.89S9795 

.041666667 


74 

54 76 

8.C023253 

.013513514 

25 

6 25 

5 0000000 

.040000000 


^5 

66 25 

8 6602540 

.013333333 

26 

6 76 

5 0990195 ; 

.038461538 


76 

57 76 

8 7177979 

.013157895 

27 

7 29 

5.19615241 

.037037037 


77 

59 29 

8.7749644 

,012987013 

28 

1 7 84 

5.2915026 

.035714286 


78 

60 84 

8 8317609 

.012820513 

29 

i 8 41 

5 3851648 

-034482750 


70 

62 41 

8-SSS1944 

.012658228 

30 

1 900 

i 5.4772256 

.033333333 


80 

64 00 

8.9442719 

.012500000 

31 

9 61 

5.5677644 

.032258005 


SI 

65 61 

' 9.0000000 

.012345679 

32 

10 24 

5 6568542 

.031250000 


82 

67 21 

9.0553S51 

.012195122 

33 

10 89 

5.7445G26 

.030303030 


83 

68 89 

9.1104336 

.012048193 

34 

1156 

5,8309519 

.029411705 


84 

70 50 

9.1651514 

.011904762 

35 

12 25 

5.9160795 

.028571429 


85 

72 25 

9.2195145 

.011704706 

36 

12 96 

6.0000000 

.027777778 


86 

73 90 

9.2736185 

; .011627907 

37 

13 69 

6.0827625 

,027027027 


87 

75 69 

9.3273791 

' .011494253 

38 

14 44 

6.1G44I40 

.026315789 


88 

77 44 

9.SS0S315 

.011303636 

39 

15 21 

6.2449980 

.025641026 


89 

79 21 

9.4339811 

.011235955 

40 

16 00 

6.3245553 

.025000000 


90 

81 00 

9,4868330 

.011111111 

41 

16 81 

6.4031242 

.024390244 


91 

82 81 

9.5393920 

.010989011 

42 

17 64 

6.4807407 

-023809524 


92 

84 64 

9-5910630 

.010809505 

43 

IS 49 

6.5574385 

.023255814 


0$ 

86 49 

9.6436508 

010752688 

44 

19 36 

6.6332496 

,022727273 


94 

88 36 “ 

9.6953597 

.010638298 

45 

20 25 

6.7082039 

.022222222 

95 

90 25 

9.7467943 

010526316 

46 

21 16 

6.7823300 

.021739130 

96 

9216 

9.7979590 

.010416667 

47 

22 09 

6.8556546 

.021276598 

97 

94 09 

9.8488578 

.010309278 

48 

23 04 

6.9282032 

.020833333 

98 

96 04 

9 8994949 

,010204082 

49 

2401 

7,0000000 

.020408163 

99 

98 01 

9.9498744 

.010101010 

50 

25 00 

7.0710678 

.020000000 

100 

100 00 

10,0000000 

.010000000 
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No 

Square 

Square Root 

Reciprocal 

.00 

101 

102 01 

10 0498756 

9900990 

102 

104 04 

10 0995049 

9803922 

103 

106 09 

10.1488916 

9708738 

104 

10816 

10.1980390 

9615385 

105 

1 10 25 

10.2469508 

9523810 

106 

112 36 

10.2956301 

9433962 

107 

1 14 40 

10.3440804 

9345794 

108 

1 16 64 

10.3923048 

9259259 

109 

118 81 

10.4403065 

9174312 

no 

12100 

10.4880885 

9090909 

111 

123 21 

10.5356538 

9009009 

112 

125 44 

10.5S30052 

8928571 

113 

127 69 

10.6.3014.58 

8840558 

114 

1 29 96 

10 6770783 

8771930 

U5 

132 25 

10.7238053 

8605652 

116 

134 56 

10.7703296 

8620690 

117 

1 36 89 

10.81CG538 

8547009 

118 

139 24 

10.8G27S05 

8474576 

119 

14161 

10.0087121 

8403361 

120 

144 00 

10.9544512 

8333333 

121 

140 41 

11.0000000 

8264403 

122 

148 84 

11.0453610 

8196721 

123 

15120 

11.0005365 

8130081 

124 

15376 

11.1355287 

8064516 

125 

156 25 

11.1803309 

8000000 

126 

158 76 

11.2249722 

7936508 

127 i 

16129 

1X^694277 

7S74016 

128 

163 84 

11.3137085 

7812500 

129 ! 

10641 

11.3578167 

7751938 

130 ' 

169 00 

11.4017513 

7692308 

131 

1 71 61 1 

11.4455231 

7633588 

132 

1 74 24 

11.4891253 

7575758 

133 

17689 

11.5325026 

7518797 

134 

179 56 

11.5758369 

7462087 

135 

182 25 

11.61S0500 

7407407 

136 

184 96 

11.6019035 

7352941 

137 

18769 

11.7046999 

7299270 

138 

1 90 44 

11.7473401 

7246377 

139 

19321 

11.7898261 

7194245 

140 

19600 

11.8321596 

7142857 

141 

19S 81 

11.8743422 

7092199 

142 

2 01 64 

11.9163753 

7042254 

143 

2 0449 

11.9582607 

6993007 

144 

2 07 36 

12 0000000 

6944444 

145 

21025 

12.0415946 

CS96552 

146 

213 16 

12 0830460 

6849315 

147 

1 216 09 

12.1243557 

6802721 

148 

219 04 

12.1655251 

6756757 

149 

2 22 01 

12 2065556 

6711409 

150 

2 25 00 

12.2474487 

6666667 


No. 

Square 

Square Root 

Reciprocal 

00 

151 

2 28 01 

12 2SS2057 

6622517 

152 

2 3104 

12 32SS280 

6578947 

153 

234 09 

12 3693169 

6535948 

154 

2 3716 

12 4006736 

6493506 

155 

2 40 25 

12 440S996 

6451613 

156 

2 4336 

12 4899960 

6410256 

157 

24649 

12 5200G4I 

C369427 

158 

2 40 64 

12 5G0SO51 

63201 14 

159 

2 52 81 

12 6095202 

62S9308 

160 

2 56 00 

12.6491106 

C250000 

IGl 

2 50 21 

12.0885775 

6211180 

162 

2 02 44 

12.7279221 

6172S40 

163 

2 65 69 

12 7671453 

6131969 

164 

2 GS 96 

12 8062485 

6007561 

165 

2 7225 

12.8452326 

6060606 

166 

2 75 50 

12 8840087 

6024096 

1G7 

2 78 80 

12 922S4S0 

5988024 

108 

282 24 

12.9614814 

5952381 

160 

2S5C1 

13 oonoooo 

5917160 

170 

2 89 00 

13 0384048 

5882353 

171 

2 9241 

13 076G968 

5S47953 

172 

2 05 84 

13 1148770 

5S 13953 

173 

2 00 29 

13 1529404 

5780347 

174 

3 0276 

13.1909060 

5747126 

175 

3 06 25 

13 2287566 

57142S6 

176 

3 00 76 

13 2664992 

5GS1S18 

177 

313 29 

13.3041347 

5649718 

178 

81684 

13.3416641 

5617978 

179 

3 20 41 1 

13.3790882 

5.586592 

ISO 

32400 : 

13.4164079 

5555556 

181 

3 27 61 

13.4536240 

5524882 

182 

3 3124 

13.4007376 

5494505 

183 

3 3489 

13.5277493 

5464481 

184 

i 3 38 56 

' 13.5646600 

5434783 

185 

i 3 42 25 

13 6014705 

5405405 

186 

345 96 

13.6381817 

5376344 

187 

349 69 

13 6747943 

5347594 

188 

3 53 44 1 

13,7113092 

5319149 

189 

357 24 

13.7477271 

5291005 

190 

3 6100 

13.7840488 

' 52G3158 

191 

; 3 64 81 

13 8202750 

5235602 

192 

36S04 

^ 13.8564065 

5208333 

193 

f 37249 

13.8924440 

5131347 

194 

' 3 76 36 

13.92S3S83 

5154639 

195 

38025* 

13.9642400 

5128205 

196 

1 3 84 16 ^ 

14.0000000 

5102041 

197 

1 3 88 09 

14 0356GSS 

5076142 

198 

3 9204; 

14 0712473 

5050505 

199 

1 39601 

14 1067360 

5025126 

200 

40000* 

14 1421356 

5000000 


S93 



No. 

Sqoara 

Square Root 

Reciprocal 

.00 

No. 

Square 

Square Root 

Reciprocal 

00 

201 

4 04 01 

14.1774469 

4975124 

251 

6 30 01 

15.8429795 

3984064 

202 

4 08 04 

14.2126704 

4950495 

252 

6 35 04 

15.8745079 

3968254 

203 

412 09 

14.247S068 

4926108 

253 

6 40 09 

15.9059737 

3952569 

204 

41616 

14.2828569 

4901961 

254 

6 4516 

15.9373775 

3937008 

205 

4 20 25 

14.3178211 

4878049 

255 

6 50 25 

15,9087194 

3921569 

206 

4 2436 

14.3527001 

4854369 


256 

6 55 36 

16.0000000 

39Q6250 

207 

4 28 49 

I4.3S74946 

4830918 


257 

6 6049 

16.0312195 

3891051 

208 

432 64 

14,4222051 

4807692 


258 

6 65 64 

16.0623784 

3875969 

209 

436 81 

14,4568323 

4784689 


259 

6 70 81 

16,0934769 

3861004 

210 

4 41 00 

14.4913767 

4761905 


260 

6 76 00 

16.1245155 

3846154 

2U 

4 45 21 

14.5258390 

4739336 


261 

6 8121 

16.1554944 

3831418 

212 

449 44 

14,5602198 

4716981 


262 

, 6 86 44 

16.1864141 

3816794 

213 

4 53 69 

14.S945195 

4694836 


263 

6 91 69 

16.2172747 

3802281 

214 

4 57 96 

14.6287388 

4672897 


264 

6 96 96 

16.2480768 

3787879 

215 

4 62 25 

14,6628783 

4651163 


265 

7 0225 

16.2788206 

3773585 

216 

4 66 56 

14.6969385 

4629630 


266 

7 07 56 

16.3095064 

3759398 

217 

4 70 89 

14.7309199 

4608295 


267 

712 89 

16 3401346 

3745318 

218 

475 24 

14,7648231 

4587156 


268 

718 24 

16.3707055 

3731343 

219 

4 79 61 

14.7986486 

4566210 


269 

7 23 61 

16,4012195 

3717472 

220 

4 84 00 

14.8323970 

4545455 


270 

7 29 00 

16.4316767 

3703704 

221 

488 41 

14.8660687 

4534887 


271 

73441 

. 16.4620776 

3690037 

222 

4 92 84 

14.8996644 

4504605 


272 

739 84 

16.4924225 

3676471 

223 

4 97 29 

14.9331845 

4484305 


273 

7 45 29 

16.5227116 

3663004 

224 

501 76 

14.9666295 

4464286 


274 

75076 

16.5529454 

3649635 

225 

506 25 

15 0000000 

4444444 


275 

756 25^ 

16 5831240 

3636364 

226 

510 76 

16.0332964 

4424779 


276 

7 6176 1 

16 6132477 

3623188 

227 , 

515 29 

15.0665192 

4405286 


277 

7 6720 1 

16.6433170 

3610108 

228' 

519 84 

15.0996689 

4385965 


278 

772 84' 

16.6733320 

3597122 

229 

5 24 41 

15.1327460 

4366812 


279 

7 78 41 

16.7032931 

3584229 

230 

529 00 

15.1657509 

4347826 


280 

7 84 00 

16.7332005 

3571429 

231 

533 61 

15.1986842 

4329004 


281 

7 89 61 1 

16.7630546 

3558719 

232 

638 24 

15.2315462 

4310345 


282 

7 95 24 

16.7928556 

3546099 

233 

542 89 

16,2643375 . 

4291845 


283 

8 00 89 

16.822603$ 

3533569 

234 

547 56 

15.2970585 ' 

4273604 


284 

8 06 56 

16 8522995 

3521127 

235 

5 52 25 

15.3297097 j 

4255319 


285 

812 25 

16.8819430 

3508772 

236 

6 5696, 

16.3622915 

4237288 


286 

817 96 

18.9115345 

, 3496503 

237 

6 61 69 ; 

15.3948043 

4219409 


287 

8 23 69 

16.9410743 

3484321 

238 

5 66 44 

15,4272486 

4201681 


288 

8 29 44 

16.9705627 

3472222 

239 

5 71 21 

15.4596248 

4184100 


289 

8 35 21 

17.0000000 

3460208 

240 

576 00 

15.4919334 

4166667 


290 

8 41 00 

17.0293864 

3448276 

241 

5 80 81 

15.5241747 

4149378 


291 

8 46 81 

17.0587221 

3436426 

242 

5 8564 

15.5563492 

4132231 


292 

8 52 64 

17.0880075 

3424658 

243 

590 49 

15.5884573 

4115226 


293 

8 58 49 

17.1172428 

3412969 

244 

5 9536 

15,6204994 

4098361 


294 

8 64 36 

17.1464282 

3401361 

245 

6 00 25 

15.6524758 

4081633 


295 

8 70 25 

17,1755640 

3389831 

246 

6 05 16 

15.6843871 

4066041 


296 

8 76 16 

17.2046505 

3378378 

247 

610 09 

15.7162336 

4048683 


297 

88209 

17.2336879 

3367003 

248 

615 04 

15.7480157 

4032253 


298 

8 8804 

17.2626765 

; 3355705 

249 

620 01 

15.7797338 

4016064 


299 

8 9401 

17.2916165 

3344482 

250 

625 00 

15 8113883 

4000000 


300 

90000 

17.3205081 

3333333 


894 



m 

SQuaro 

Square Root 

Recigrooal 

301 

9 0601 

17.5463516 

3322259 

302 

912 04 

17.3781472 

3311258 

303 

91809 

17.40GS953 

3800330 

304 

9 2416 ' 

17.4355968 

3289474 

305 

9 30 25 

17.4642492 

3278689 

306 

9 36 36, 

17,4928557 

3267974 

307 

94249' 

17.6214155 

S257329 

308 

9 48 64 

17,5499288 : 

3246753 

309 

9 64 81 

17.6783958 ; 

3236246 

310 

9 6100 

17.6068169 

3225806 

311 

9 67 21 

17.6351921 

3215434 

312 

9 7344 

17.6635217 

8205128 

313 

9 79 69 

17.6918069 

3194888 

314 

9 85 96 

17.7200451 , 

3184713 

315^ 

9 92 25 

17.7482393 

3174603 

316 

9 98 56 

17.7763888 

3164557 

317 

10 04 89 

17.8044938 

3154574 

318 

10 11 24 

17.8325545 ■ 

3144654 

219' 

10 17 61 

17.8605711 ' 

3134796 

320 

10 24 00 

17.8S8S438 

3125000 

321 

10 30 41 

17.9164729 

3115265 

322 

10 36 84 ' 

17.9443584 ' 

3105590 

823 

10 43 29 

17.9722008 

3095975 

824 

1049 76 

18.0000000 i 

3086420 

325 

10 56 25 

18.0277564 ■ 

3076923 

326 

10 62 76 

18,0554701 

3067485 

327 

10 69 29 

18.0831413 1 

3058104 

328' 

1075 84 

18.1107703 ’ 

3048780 

329 

10 82 41 

18.1383671 

3039514 

830 . 

10 89 00 

18.1659021 

3030303 

331 

10 9561 

18.1934054 

3021148 

332 

1102 24 

18.2208072 

3012048 

833 

,1108 89 

18.2482876 

3003003 

334 

It 15 56 

18,2756669 

2994012 

335 

1122 25 

18.3030052 

29S5075 

336 

11 28 96 

18.3303028 

2976190 

337 

1135 69 

18.3575598 

2967359 

338 

114244 

18 .3847763 

2958580 

339 

1149 21 

18.4119526 

, 2949853 

840 

11 56 00 

18.4390889 

’ 2941178 

341 

1162 81 

18.4661853 

2932551 

342 

11 69 64 

18.4932420 

2923977 

343 

117649 

' 18.S202S92 

2915452 

344 

1183 36 

18.6472370 

2906977 

845 

1190 25 

. 18.6741756 

2898551 

346 

119716 

' 18.6010752 

2S90173 

347 

12 04 09 

18.6279360 

2SS1844 

348 

12 11 04 

' 18,6547581 

2873663 

349 

12 18 01 

18.6815417 

2865330 

350 

12 2500 

18.708:S69 

2867143 


No. 

Square 

Square Root 

Reciprocal 

351 

352 

353 

12 3201 
1239 04 
12 46 09 

18.7349940 

18.7616630 

18.7882042 

2849003 

2840909 

2832861 

354 

355 

356 

1253 16 

12 60 25 

126736 

15 sr«rr 
I'i '.r -'or 
18 

2824859 

2816901 

28089S9 

357 

358 

359 

1274 49 
12 8164 
12 88 81 

18.8944436 

18.9208879 

18.9472963 

2801120 

2793296 

2785515 

360 

361 

362 

1296 00 
13 03 21 
13 10 44 

18.9736660 

19.0000000 

19.0262976 

2777778 

2770083 

2762431 

363 

364 

365 

131769 
13 24 96 
1332 25 

19.0525589 

19.0787840 

19.1049732 

2754821 

2747253 

2739726 

366 

367 
36$ 

,13 89 56 
13 46 89 
13^424 

19,1311265 

19.1572441 

19.1833261 

2732240 

2724796 

2717391 

369' 

370 

371 

136161 
is 69 00 
13 76 41 

19.2093727 

19.2353841 

19.2613603 

2710027 

2702703 

2695418 

372 
• 373 
: 374 

138384 
13 91 29 
139876, 

19,2873015 . 

19.3132079 

19.3390796 

2688172 

26S0965 

2673797 

' 875 
. 376 
' 377 

140625 
■1413 76 
il2l29 

19.3649107 

19.3907194 

19.4104878 

2666667 

2659674 

2652520 

1 378’ 
379 

m 

142884 1 
14 36 41 
144400 

19.4422221 

19.4679223 

19.4936887 

2645503 

2638622 

2631679 

ssr 

S82 

! m 

14 5161 
14 59 24 
14 6689 

19.5192213 

19.6448203 

19.5703868 

2624672 

2617801 

2610966 

; 884 
385 

m 

14 74 56 
14 82 25 
1489 96 

19.5959179 

19.0214160 

19.6468827 

2604167 

2597403 

2590674 

387 
, 38$ 
3S9 

14 9769 

15 05 44 
15 1321 

19.6723156 

19.6977156 

19.7230829 

25S3979 

2577320 

2570694 

390 

891 

392 

15 2100 
15 28 81 
1536 64 

19.7484177 

19.7737199 

19.7989899 

2564103 

2557545 

2551020 

393 

394 

395 

154449 
15 52 36 
15 60 25 

19.8242276 

19.8494332 

19.8746069 

2544629 

2538071 

2531646 

396 

397 

398 

15 68 16 
15 76 09 
15 84 04 

19.8997487 

19.9248588 

19.9499373 

25252® 

2518892 

2512503 

399 

400 

15 92 01 
160000 

19.9749844 

20.0000000 

2506266 

2500000 


m 








No, 

Square 

Square Root 

Reciprocal 


No 

Square 

Square Root 

Reciprocal 

.00 

401 

16 0801 

20.0249844 

2493766 


451 

20 34 01 

21.2307606 

2217295 

402 

16 16 04 

20.0499377 

2487562 


452 

20 43 04 

21.2602916 

2212389 

403. 

16 24 09 

20.0748599 

2481390 


453 

20 52 09 

21.2837967 

2207506 

404' 

1632lg 

20.0997512 

2475248 


454 

20 61 16 

21.3072758 

2202643 

405 

16 40 25 

20.1246118 

2469136 


455 

20 7025 

21 3307290 

2197802 

406 

164836 

20.1494417 

2463054 


456 

20 7936 

21,3541565 

2192982 

407 

16 5649 

20:1742410 

2457002 


457 

20 88 49 

21.3775583 

2188184 

408 

16 64 64 

20.1990099 

2450980 


458 

20 97 64 

21.4009346 

2183406 

409 

16 72 81 

20.2237484 

2444988 


459 

21 0681 

21.4242853 

2178649 

410 

16 81 00 

20,2484567 

2439024 


460 

21 16 00 

21.4476106 

2173913 

411 

16 89 21 

20,2731349 

2433090 


461 

21 25 21 

21 4709106 

2169197 

412 

16 9744 

20.2977831 

2427184 


462 

21 3444 

21.4941853 

2164502 

413 

17 05 69 

20.3224014 

2421308 


463 

21 43 69 

21,5174348 

2159827 

414 

1713 96 

20.3469899 

2415459 


464 

21 52 96 

21.5406592 

2155172 

415 

17 22 25 

20.3715488 

2409639 


465 

21 6225 

21.5638587 

2160538 

416 

17 30 56 

20.3960781 

2403846 


466 

21 71 56 

21 5870331 

2145923 

417 

1738 89 

20.4205779 

2398082 


467 

21 80 89 

21.6101828 

2141328 

418 

1747 24 

20 4450483 

2392344 


468 

21 90 24 

21.6333077 

2136752 

419 

17 55 61 

20 4694895 

2386635 


469 

21 99 61 

21 6564078 

2132196 

420 

17 64 00 

20 4939015 

2380952 


470 

22 09 00 

21 6794834 

2127660 

421 

17 72 41 

20 5182845 

2375297 


471 

22 1841 

21 7025344 

2123142 

422 

17 80 84 

20 6426386 

2369668 


472 

22 27 84 

21 7255610 

2118644 

423 

17 89 29 

20 5669638 

2364066 


473 

22 37 29 

21.7485632 

2114165 

424 

17 97 76 

20 5912603 

2358491 


474 

22 46 76 

21 7715411 

2109705 

425 

18 06 25 

20 6155281 

2352941 


475 

22 56 25 

21 7944947 

2105263 

426 

1814 76 

20 6397674 

2347418 


476 

22 65 76 

21 8174242 

2100840 

427 

18 23 29 j 

20 6639783 

2341920 


477 

22 76 29 

21 8403297 

2096436 

428 

18 3184 

20 6881609 

2336449 


478 

22 8484 

21 8632111 

2092050 

429 

18 40 41 

20 7123152 

2331002 


479 

22 94 41 

21 8860686 

2087683 

430 

1849 00 

20 7364414 

2325581 


480 

23 04 00 

21 9089023 

2083333 

431 

18 57 61 

20 7605395 

2320186 


481 

23 13 61 

21 9317122 

2079002 

432 

18 66 24 

20 7846097 

2314815 


482 

23 23 24 

21 9544984 

2074689 

433 

18 74 89 

20 8086520 

2309469 


483 

23 32 89 

21 9772610 

2070393 

434 

1883 56 

20 8326667 

2304147 


484 

23 42 56 

22 0000000 

2066116 

435 

i 18 9225 

20 8566536 

2298851 


485 

23 52 25 

i 22 0227155 

2061856 

436 

! 19 00 96 

20 8806130 

2293578 


486 

23 61 96 

1 22 0454077 

2057613 

437 

19 09 69 

20 9045450 

2288330 


487 

23 71 69 

‘ 22 0680765 

2053388 

438 

1918 44 

20 9284495 

2283105 


488 

23 81 44 

22.0907220 

2049180 

439 

19 27 21 

20 9523268 

2277904 


489 

23 91 21 

22 1133444 

2044990 

440 

19 36 00 

20 9761770 

2272727 


490 

24 01 00 

22 1359436 

2040816 

441 

19 44 81 

21 0000000 

2267574 


491 

24 10 81 

22 1585198 

2036660 

442 

19 53 64 

21 0237960 

2262443 


492 

24 20 64 

22 1810730 

2032520 

443 

1962 49 

21 0475652 

2257336 


493 

I 243049 

22.2036033 

2028398 

444 

19 71 36 

21 0713075 

2252252 


494 

24 40 36 

22 2261108 

2024291 

445 

198025 

21 0950231 

2247191 


495 

24 50 25 

22 ^485955 

2020202 

446 

19 89 16 

21 1187121 

2242152 


496 

24 60 16 

22 2710575 

2016129 

447 

19 98 09 

21 1423745 

2237136 


497 

24 70 09 

22 2934968 

2012072 

448 

20 07 04 

21 1660105 

2232143 


498 

24 80 04 

22 3159136 

^ 2008032 

449 

20 1601 

21 1896201 

2227171 


499 

24 90 01 

22 3383079 

2004008 

460 

20 25 00 

21 2132034 

2222222 


600 

25 00 00 

22.3606798 

2000000 


S96 



m 

Square 

Square Upot 

Reciprocal 

.00 

601 

25 10 01 

22 3830293 

1996008 

502 

25 20 04 

22.4053565 

1992032 

603 

25 30 09 

22,4276615 

1988072 

504 

25 4016 

22.4499443 

1984127 

505 

25 50 25 

22.4722051 

1980198 

606 

25 6036 

22.4944438 

1976285 

507 

25 70 49 

22 5166605. 

1972387 

608 

25 80 64 

22.5388553 

1968504 

509 

25 90 81 

22.6610283 

1964637 

510 

26 01 00 

22 5831796 

1960784 

511 

26 11 21 

22.6053091 

1956947 

612 

26 2144 

22.6274170 

1953125 

513 

26 31 69 

22.6495033 

1949318 

614 

26 41 96 

22.6715681 

1945525 

615 

26 62 25 

22.6936114 

1941748 

516 

26 62 66 

22.7156334 

1937984 

617 

26 72 89 

22.7376340 

1934236 

618 

26 83 24 

22.7596134 

1930502 

519 

26 93 61 

22.7815715 

1926782 

620 

27 04 00 

22 8035085 

1923077 

621 

27 14 41 

22 8254244 

1919386 

522 

27 24 84 

22.8473193 

1915709 

523 

27 35 29 

22.8691933 

1912046 

524 

2745 76 

22,8910463 

1908397 

525 

27 56 25 

22.9128785 

1904762 

526 

27 66 76 

22 9346899 

1901141 

527 

27 77 29 i 

22.9564806 

1897533 

528 

27 87 84 

22.9782506 

1893939 

529 

27 9841 


1890359 

530 


23.0217289 

1886792 

531 

28 19 61 

23.0434372 

1883239 

532 

28 30 24 

23.0651252 

1879699 

533 

2840 89 

23.0867928 

1876173 

634 

28 51 56 

23,1084400 

1872659 

535 


23.1300670 

1869159 

536 

28 72 96 

23.1616738 

1865672 

537 


23.1732605 

1862197 

538 


23.1948270 

1858736 

639 

29 0521 

23.2163735 

1855288 

640 

291600 

23.2379001 

1851852 

641 

29 2681 

23,2594067 

1848429 

642 

2937 64 

23.2808935 

1845018 

643 

294849 

23.3023604 

1841621 

644 

29 5936 

23.3238076 

1838235 

646 

29 7025 

23.3452351 

1834862 

646 

2981 16 

23.3666429 

1831502 

547 

29 9209 

•23.3880311 

182S164 

548 

3003 04 

23.4093998 

1824818 

549 

3014 01 

23.4307490 

1821494 

650 

30 25 00 

23.4520788 

1818182 


No. 

Square 

Square Root 

Reciprocal 

551 

30 36 01 

23.4733892 

1814882 

552 

30 47 04 

23 4946802 

1811594 

553 

30 58 09 

23 51S9520 

1808318 

554 

30 69 16 

23.6372046 

1805064 

555 

30 80 25 

23.6584380 

1801802 

556 

30 91 36 

23.6798522 

179S561 

557 

31 02 49 

23.6008474 

1795332 

658 

31 13 64 

23 6220238 

1792115 

659 

31 24 81 

23 6431808 

1788909 

660 

31 36 00 

23.6643191 

1785714 

661 

31 47 21 

23.6854386 

1782531 

562 

31 58 44 

23.7065392 

1779359 

663 

31 69 69 

23 7276210 

1776199 

564 

31 80 96 

23.7486842 

1773050 

565 

31 92 25 

23.7697286 

1769912 

566 

32 03 56 

23.7907545 

17667S4 

567 

32 14 89 

23 8117618 

1763668 

568 

32 26 24 

23 8327506 

1760563 

569 

32 37 61 

23.8537209 

1757469 

570 

32 49 00 

23 8746728 

1754386 

571 

32 60 41 

23.8956063 

1751313 

572 

32 7184 

23 9165215 

1748252 

573 

32 83 29 

23.9374184 

1745201 

574 

32 94 76 

23.9582971 

1742160 

575 

33 06 25 

'23 9791576 ! 

1739130 

576 

33 17 76 

24 OOOOOOO 

1736111 

677 

33 29 29 

24.0208243 

1733102 

578 

33 40 84 

24 0416306 

1730104 

579 

S3 52 41 

24.0624188 

1727116 

580 

33 6400 

24 0831891 

1724138 

5S1 

33 75 61 

24 1039416 

1721170 

5S2 

33 87 24 

24 1246762 

1718213 

583 

33 98 89 

24 1453929 

1715266 

584 

34 10 56 

24 1660919 

1712329 

585 

34 22 25 

24 1867732 

1709402 

586 

34 33 96 

24 2074369 

1706485 

587 

34 45 69 

24.2280829 

1703578 

688 

34 57 44 

24 2487113 

1700680 

589 

34 6921 

24.2693222 

1697793 

590 

34 81 00 

24 2899156 

1694915 

591 

34 92 81 

24 3104916 

1692047 

592 

35 04 64 

24 3310501 

1689189 

593 

351649 

24 3515913 

1686341 

594 

35 28 36 

24 3721152 

1683502 

595 

35 4025 

24 3926218 

1680672 

596 

35 52 16 

24 4131112 

1677852 

597 

35 64 09 

24 4335834 

1675042 

598 

35 76 04 

24 4540385 

1672241 

699 

35 88 01 

24 4744765 

1669449 

600 

36 OO 00 

24 4918974 

1666067 


«97 








601 3612 01 24.6153013 

602 36 24 04 24.6356883 

603 36 3609 24.6560583 

604 36 48 16 24.6764115 

605 36 60 25 24.5967478 

606 36 7336 24.6170673 

607 36 84 49 24.6373700 

608 36 96 64 24.6576560 

609 37 08 81 24.6779254 

610 372100 24.6981781 

611 37 3321 24 7184142 

612 374544 ' 24.7386338 

613 37 67 69 24.7588368 

614 37 69 96 24.7790234 

615 37 8225 24.7991935 

616 37 04 56 24.8193473 

617 38 06 89 24.8394847 

618 38 19 24 24.8596058 

619 3831 61 24.8797106 

620 38 44 00 24.8997992 

621 38 5641 24.9198716 

622 38 68 84 24.9399278 

623 38 8129 24.9599679 

624 3893 76 24.9799920 

625 39 06 25 25.0000000 

626 ! 39 18 76 25 0199920 

627 39 8129 25.0399681 

628 39 43 84 25.0599282 

629 39 5641 25.0798724 

630 39 6900 25.0998008 

631 39 81 61 25 1197134 

632 39 94 24 25.1396102 

633 40 06 89 25.1594913 

634 40 19 56 25.1793566 

635 40 83 25 25.1992063 

636 40 44 96 25.2190404 

637 40 57 69 25.2388589 

638 40 70 44 25.2586619 

639 40 8321 25.2784493- 

64Q 4096 00 25.2982213 

641 41 OS 81 25.3179778 

642 4121 64 25.3377189 

643 4134 49 25.3574447 

644 414736 25-3771551 

645 4160 25 25.3968502 

646 41 73 16 25.4165301 

647 4186 09 25.4361947 

648 419904 25.4558441 

649 421201 25.4754784 

650 432500 25.4950976 


Heoiprocal 

.00 


1663894 

166U30 

1658375 

1655629 

1652S93 

1650165 

1647446 

1644737 

1642036 

1639344 

1636661 

1633987 

1631321 
1628G04 I 

1626016 i 

1623377 

1620746 

16X8123 

1615509 

1612903 

1610306 

1607717 

1605136 

1602564 

1600000 

1697444 

1694896 

1592357 

16$9$25 

1687302 

1684786 

1582278 

1579779 

1677287 

1574803 

1572327 

1569850 

1567398 

1564946 

1562500 

1580062 

1557632 

1555210 

1552795 

155038S 

1547988 

1545595 

1543210 

1540832 
1538462 I 













No. 

Square 

Square Root 

Reciprocal 

.00 

m 

Square 

SauarQ Root 

Reciprocal 

.00 

701 

4914 01 

26.4764046 

1426534 

751 

56 40 01 

27,4043792 

1331558 

702 

49 28 04 

26.4952826 

1424501 

752 

66 55 04 

27 .4226184 

1329787 

703 

49 42 09 

26.5141472 

1422475 

763 

5670 09 

27.4408455 

1328021 

704 

49 56 16 

26.5329983 

1420455 

754 

568516 

27.4590604 

1326260 

705 

49 70 25 

26.5518361 

1418440 

755 

57 00 25 

27.4772633 

1324503 

706 

49 84 36 

2^.5706605 

1416431 

756 

571536 

27.4954542 

1322751 

707 

49 98 49 

26.5894716 

1414427 

757 

673049 

27.5136330 

1321004 

708 

50 12 64 

26.6082694 

1412429 

758 

57 45 64 

27.5317998 

1319261 

709 

50 26 81 

26.6270539 

1410437 . 

769 

57 60 81 

27.5499546 

1317^3 

710 

50 41 00 

26 6458252 

1408451 

760 

57 7600 

27.5680975 

1315789 

711 

50 55 21 

26 6645833 

1406470 

761 

57 91 21 

27.5862284 

1314060 

712 

50 69 44 

26.6833281 

1404494 

762 

58 0644 

27.6043475 

1312336 

713 

50 $3 69 

26.7020598 

1402525 

763 

58 21 69 

27.6224546 

131D616 

714 

50 97 96 

26.7207784 

1400560 

764 

58 36 96 

27.6405499 

1308901 

715 

51 12 25 

26.7394839 

1398601 

765 

58 5225 

27.6586334 

1307190 

716 

51 26 56 

26.7581763 

1396648 

766 

58 6756 

27.6767050 

1305483 

717 

51 40 89 

26 7768557 

1394700 

767 

68 8289 

27.6Si^8 

1303781 

718 

51 55 24 

26.7955220 

1392758 

768 

68 9824 

27.7128129 

1302083 

719 

61 69 61 

26 8141754 

1390821 

769 

59 13 61 

27.7308492 

1300390 

720 

51 84 00 

26.8328157 

1388889 

770 

59 29 00 

27.7488739 

1298701 

721 

51 98 41 

26.8514432 

1386963 

771 

59 4441 

27.7668868 

1297017 

722 

52 12 84 

26.8700577 

1385042 

772 

59 5984 

27.7848880 

1295337 

723 

52 27 29 

26.8886593 

1383126 

773 

59 7529 

27,8028775 

1293661 

724 

52 41 76 

26.9072481 

1381215 

774 

59 9076 

27.8208555 

1291990 

725 

52 56 25 

26.9258240 

1379310 

775 

60 0625 

27.8388218 

1290323 

726 

52 70 76 

26.9443872 

1377410 

776 

60 2176 

27.8567766 

1288660 

727 

52 85 29 

26.9629375 

1375516 

777 

60 37 29 

27.8747197 

1287001 

728 

52 99 84 

26.9814751 

1373626 

778: 

605284 

27.8926514 

.1285347 

729 

S3 14 41 

27 0000000 

1371742 

779 

60 6841 

27.9106715 

1288697 

730 

53 29 00 

27.0185122 

1369863 

780 

60 84 00 

27.9284801 

1282051 

731 

53 43 61 

27.0370117 

1367989 

781 

60 99 61 

27.9463772 

1280410 

732 

53 58 24 

27.0554985 

1366120 

782 

61 15 24 

27.9642629 

1278772 

733 

53 72 89 

27.0739727 

1364256 

783 

61 3089 

27.9821372 

1277139 

734 

53 87 56 

27,0924344 

136239$ 

784 

61 46 56 

28.0000000 

1275510 

735 

54 02 25 

27.1108834 

1360544 

785 

61 6225 

28.0178515 

1273885 

736 

54 16 96 

27.1293199 

1858696 

786 

61 7796 

28.0356915 

1272265 

737 

54 31 69 

27.1477439 

1356852 

787 

61 93 69 

28.0535203 

1270648 

738 

5446 44 

27.1661554 

1355014 

788 

62 0944 

28.0713377 

1269036 

739 

54 61 21 

274845544 

1353180 

789 

622521 

28.0891438 

.1267427 

740 

5476 00 

27,2029410 

1351351 

790 

6241 00 

28.1069386 

1265823 

741 

54 90 81 

27,2213152 

1349528 

791 

62 56 81 

28.1247222 

1264223 

742 

55 05 64 

27.2396769 

1347709 

792 

62 72 64 

28.1424946 

1262626 

743 

55 20 49 

27,2580263 

1345895 

793 

62 8849 

28.1602557 

1261034 

744 

55 35 36 

27.2763634 

1344086 

794 

63 0436 

28.1780056 

1259446 

745 

55 5025 

27.2946881 

1342282 ‘ 

795 

63 20 25 

28.1957444 

1257862 

746 

55 65 16 

27.3130006 

13401'^3 

796 

63 86 16 

28.2134720 

1256281 

747 

55 80 09 

27.3313007 

13*5 >51^8 

1 ' 

797 

63 5209 

28.2311884. 

1254705 

748 

55 95 04 

27.3495887 

|JL3oUb/5 


798 

63 68 04 

28,2488938' 

1253133 

749 

56 10 01 

27.3678644 

"1335113 


799 

63 84 01 

28,2665881 

1251564 

750 

56 25 00 

27.3861279 

1333333 


800 

64 00 00 

28.2842713 

1250000 


mo 
















No. Square Square Root Reciprocal 


901 81 18 01 30 0166620 1109878 

902 81 36 04 30 0333148 1108647 

903 81 54 09 30.0499584 1107420 

904 817216 30.0665928 1106195 

905 81 90 25 30 0832 179 1104972 

906 82 08 36 30.0998339 1103753 

907 82 26 49 30.1164407 1102536 

908 82 44 64 30.1330383 1101322 

909 82 62 81 30.1496269 1100110 

910 82 81 00 30 1662063 1098901 

911 82 99 21 30.1827765 1097695 

912 83 17 44 30.1993377 1096491 

913 83 35 69 30.2158899 1095290 

914 83 53 96 30.2324329 1094092 

915 83 72 25 30 2489669 1092896 

916 83 90*66 30.2654919 1091703 

917 84 08 89 30 2820079 1090513 

918 84 27 24 30.2985148 1089325 


910 82 8100 

911 82 99 21 

912 8317 44 

913 83 35 69 

914 83 53 96 

915 83 72 25 

916 83 90*66 

917 84 08 89 

918 84 27 24 

919 84 45 61 

920 84 64 00 

921 84 82 41 

922 85 00 84 

923 85 19 29 

924 85 37 76 

925 85 56 25 

926 85 74 76 

927 85 93 29 

928 8611 84 

929 86 30 41 

930 86 49 00 

931 86 67 61 

932 86 86 24 

933 87 04 89 

934 87 23 56 

935 87 42 25 

936 87 60 96 

937 87 79 69 

938 87 98 44 

939 8817 21 

940 88 36 00 

941 88 54 81 

942 88 73 64 

943 88 92 49 

944 8911 36 

945 89 30 25 

946 89 4916 

947 89 68 09 

948 89 87 04 

949 90 06 01 

950 90 25 00 


SO 3150128 1088139 
30.3315018 1086957 
30.3479818 1085776 


30 3644529 
30 3809151 
30.3973683 

30.4138127 

30,4302481 

30.4466747 

30.4630924 

30.4795013 

30.4959014 

30 5122926 
30.5286750 
30.5450487 

30.5614136 
30 5777697 
30.5941171 

30.6104557 
30 6267857 
SO 6431069 

30.6594194 

30.6757233 

30.6920185 

30.7083051 

30.7245830 

30.7408523 

30.7571130 
30 7733651 
80.7896086 

80.8058436 

80.8220700 


No. Square Square Root 


951 90 44 01 30 83S2S79 1051525 

952 90 63 04 3 0 8544972 1050420 

963 90 82 0 9 3 0 8706981 1049318 

954 91 01 16 30 8868904 1048218 

955 91 20 25 30 9030743 1047120 

956 91 39 36 30.9192497 1046025 


957 91 58 49 

958 91 77 64 

959 91 96 81 

960 92 16 00 ^ 

961 92 35 21 

962 92 54 44 

963 92 73 69 

964 92 92 96 

965 93 12 25 


1084599 

1083424 

1082251 

1081081 

1079914 

1078749 

1077586 

1076426 

1075269 

1074114 

1072961 

1071811 

1070664 

1069519 

1068376 

1067236 

1066098 

1064963 

1063830 

1062699 

1061571 

1060445 

1059322 

1058201 

1057082 

10550:3f> 

1054852 

1053741 

1062632 


30.9354166 
SO 9515751 
SO 9677251 

30.9838668 
31 0000000 
31 C161248 

31.0322413 
31 0483494 
31.0644491 


1044932 I 
1043841 I 
1042753 

1041667 

1040583 

1039501 

1038422 

1037344 

1036269 


966 93 31 56 31.0805405 1035197 

167 93 50 89 31.0966236 1034126 

>68 93 70 24 31.1126984 1033058 

>69 93 89 61 31.1287648 1031992 

>70 94 09'00 31.1448230 1030928 

>71 94 28 41 31.1608729 1029S66 


>72 94 47 84 

)73 94 67 29 

>74 94 86 76 

>75 95 06 25 

>76 95 25 76 

>77 95 4529 

978 95 64 84 

979 95 84 41 

980 96 04 00 

981 96 23 61 

982 96 43 24 

983 96 62 89 

984 96 82 56 

985 97 02 25 

986 97 21 96 

987 97 41 69 

988 97 61 44 

989 97 81 21 

990 98 01 00 

991 98 20 81 

992 98 40 64 

993 98 60 49 

994 98 80 36 

995 99 00 25 

996 99 2016 

997 99 40 09 

998 99 60 04 

999 99 80 01 

1000 100 00 00 


31.1769145 1028807 
31.1929479 1027749 
31.2089731 1026694 

31 2249900 1025641 
31.2409987 1024590 
31.2569992 1023541 

31.2729915 1022495 
31.2889757 1021450 
31.3049517 1020408 

31.3209195 1019368 
31.3368792 1018330 
31,3528308 1017294 


31.3687743 

31.3847097 

31.4006369 

31.4165561 

31.4324673 

31.4483704 

31.4642654 

31.4801525 

31.4960315 

31 5119025 
31 5277655 
31.5436206 

31.6594677 

31.5753068 

31.5911380 


1016260 

1015228 

1014199 

1013171 

1012146 

1011122 

lOlOlOl 

1009082 

1008065 

1007049 

1006036 

1005025 

1004016 

1003009 

1002004 


31 6069613 1001001 
31.6227766 1000000 










APPENDIX P 

Table of Logarithms 


001301 001734 002166 

5609 6038 6466 

9876 010300 010724 

014100 4521 4940 

8284 8700 9116 


003891 432 

8174 428 
012415 424 
6616 420 
020775 416 

4896 412 
8978 408 
033021 404 

7028 400 
040998 397 

044932 393 
8830 390 
052694 386 
6524 383 
060320 379 

4083 376 
7815 373 
071514 370 
5182 368 
8819 363 

082426 360 
6004 357 
9552 355 
093071 352 

6562 349 

100026 346 
3462 343 
6871 341 

110253 338 
3609 335 

116940 333 
120245 330 
3525 328 
6781 325 

130012 323 

3219 32 ! 
6403 318 
9564 316 
142702 314 
5818 311 

148911 309 
151982 307 
5032 305 
8061 303 
161068 301 

4055 299 
7022 297 
9968 295 
172895 293 
5802 291 

178689 289 
181558 287 
4407 285 
7239 283 
190051 281 

I 2846 279 

I 5623 278 

8382 276 
201124 274 
i 3848 272 


902 





2 

3 

n 

5 

6 

7 

8 

9 

a 

160 

204120 204391 

204663 

204934 

205204 

205475 

205746 

206016 

206286 

206556 

271 

1 

6826 7096 

7365 

7634 

7904 

8173 

8441 

8710 

8979 

9247 

269 

2 

9515 9783 

210051 

210319 

210586 

210853 

211121 

211388 

211654 

211921 

267 

3 

212188 212454 

2720 

2986 

3252 

3518 

3783 

4049 

4314 

4579 

266 

4 

4844 5109 

5373 

5638 

5902 

6166 

6430 

6694 

6957 

7221 

254 

165 

7484 7747 

8010 

8273 

8536 

8798 

9080 

9323 

9585 

9846 

262 

6 

220108 220370 

220631 

220892 

221153 

221414 

221675 

221936 

222196 

222456 

261 

7 

2716 2976 

3236 

3496 

3755 

4015 

4274 

4533 

4792 

5051 

259 

8 

5309 5568 

5826 

6084 

6342 

6600 

6858 

7115 

7372 

7630 

258 

9 

7887 8144 

8400 

8657 

8913 

9170 

9426 

9682 

9938 

230193 

256 

170 

230449 230704 

230960 

231215 

231470 

231724 

231979 

232234 

232488 

232742 

255 

1 

2996 3250 

3504 

3757 

4011 

4264 

4517 

4770 

5023 1 

5276 

253 

2 

5528 5781 

6033 

6285 

6537 

6789 

7041 

7292 

7544 

7795 

252 

3 

8046 8297 

8548 

8799 

9049 

9299 

9550 

9800 

240050 

240300 

250 

4 

240549 240799 

241048 

j 241297 

241546 

241795 

242044 

242293 

2541 

2790 

249 

175 

3038 3286 

3534 

! 3782 

4030 

4277 

4525 

4772 

5019 

5266 

248 

6 

5513 5759 

6006 

6252 

6499 

6745 

€991 

7237 

7482 

7728 

246 

7 

7973 8219 

8464 

8709 

8954 

9198 

9443 

9687 

9932 

250176 

245 

8 

250420 250664 

250908 

251151 

251395 

251638 

251881 

252125 

252368 

2610 

243 

9 

2853 3096 

3338 

3580 

3822 

4064 

4306 

4548 

4790 

5031 

242 

180 

255273 2S5514 

255755 

255996 

256237 

256477 

256718 

256958 

257193 

257439 

241 

1 

7679 7918 

8158 

8398 

8637 

8877 

9116 

9355 

9594 

9833 

239 

2 

260071 260310 

260548 

260787 i 

261025 

261263 

261501 

261739 

261976 

262214 

238 

3 

2451 2688 

2925 

3162 1 

3399 

3636 

3873 

4109 

4345 

4582 

237 

4 

4818 5054 

5290 

5525 ! 

5761 

5996 

6232 

€467 

6702 

6937 

235 

185 

7172 7406 

7641 

7875 

8110 

8344 

8578 

8812 

9046 

9279 

234 

6 

9513 9746 

9980 

270213 

270446 

270679 

270912 

271144 

271377 

271609 

233 

7 

271842 272074 

272306 

2538 

2770 

3001 

3233 

3464 

3696 

3927 

232 

8 

4158 4389 

4620 

4850 

5081 

5311 

5542 

5772 

6002 

6232 

230 

9 

6462 6692 

6921 

7151 

7380 

7609 

7838 

8067 

8296 

8525 

229 

190 

278754 278982 

279211 

279439 

279667 

279895 

280123 

280351 

280578 

280806 

228 

1 

281033 281261 

281488 

281715 

281942 1 

282169 

2396 

2622 

2849 

3075 

227 

2 

3301 3527 

3753 

3979 

4205 ! 

4431 

4656 

4882 

5107 

5332 

226 

3 

5557 5782 

6007 

6232 

6456 

6681 

6905 

7130 

7354 

7578 

225 

4 

7802 8026 

8249 

8473 
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7787 

7841 

7895 

54 

9 

7949 

8002 

8056 

8110 

8163 

8217 

8270 

8324 

8378 

8431 

54 

810 

908485 

908539 

903592 

908646 

908599 

908753 

908807 

908860 

908914 

908967 

54 

1 

9021 

9074 

9128 

9181 

9235 

9239 

9342 

9396 

9449 

9503 

54 

2 

9556 

9510 

9SS3 

9716 

9770 

9323 

9877 

9930 

9984 

910037 

53 

3 

910091 

910144 

910197 

910251 

910304 

910358 

910411 

910464 

910518 

0571 

53 

4 

0624 

0678 

0731 

0784 

0838 

0391 

0944 

0998 

1051 

1104 

53 

815 

1158 

1211 

1264 

1317 

1371 

1424 

1477 

1530 

1584 

1637 

53 

6 

1690 

1743 

1797 

1850 

1903 

1956 

2009 

2063 

2116 

2169 

53 

7 

2222 

2275 

2328 

2381 

2435 

2488 

2541 

2594 

2647 

2700 

53 

8 

2753 

. 2806 

2859 

2913 

2966 

3019 

3072 

3125 

3178 

3231 

53 

9 

3284 

1 3337 

3390 

3443 

3496 

3549 

3602 

3655 

3708 

3761 

53 


B 

4 

2 

3 

4 

5 

6 

7 

3 

9 

B. 
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N. 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

D. 

940 

973128 

973174 

973220 

973266 

973313 

973359 

973405 

973451 

973497 

973543 

46 

I 

3590 

3636 

3682 

3728 

3774 

3820 

3866 

3913 

3959 

4005 

46 

2 

4051 

4097 

4143 

4189 

4235 

4281 

4327 

4374 

4420 

4466 

46 

3 

4512 

4558 

4604 

4650 

4696 

4742 

4788 

4834 

4880 

4926 

46 

4 

4972 

5018 

5064 

5110 

5156 

5202 

5248 

5294 

5340 

5386 

46 

945 

5432 

5478 

5524 

5570 

5616 

5662 

5707 

5753 

5799 

5845 

46 

6 

5891 

5937 

5983 

6029 

6075 

6121 

6167 

6212 

€258 

6304 

46 

7 

6350 

6396 

6442 

6488 

6533 

6579 

6625 

6671 

6717 

6763 

46 

8 

6808 

6854 

6900 

6946 

6992 

7037 

7083 

7129 

7175 

7220 

46 

9 

7266 

7312 

7358 

7403 

7449 

7495 

7541 

7586 

7632 

7678 

46 

950 

977724 

977769 

977815 

977861 

977906 

977952 

977998 

978043 

978089 

978135 

46 

1 

8181 

8226 

8272 

8317 

8363 

8409 

8454 

8500 

8546 

8591 

46 

2 

8637 

8683 

8728 

8774 

8819 

8865 

8911 

8956 

9002 

9047 

46 

3 

9093 

9138 

9184 

9230 

9275 

9321 

9366 

9412 

9457 

9503 

46 

4 

9548 

9594 

9639 

9685 

9730 

9776 

9821 

9867 

9912 

9958 

46 

965 

980003 

980049 

980094 

980140 

980185 

98023! 

980276 

980322 

980367 

980412 

45 

6 

0458 

0503 

0549 

0594 

0640 

0685 

0730 

0776 

0821 

0867 

45 

7 

0912 

0957 

1003 

1048 

1093 

1139 

1184 

1229 

1275 

1320 

45 

8 

1366 

1411 

1456 

1501 

1547 

1592 

1637 

1683 

1728 

1773 

45 

9 

1819 

1864 

1909 

1954 

2000 

2045 

2090 

2135 

2181 

2226 

45 

860 

982271 

982316 

982362 

982407 

982452 

982497 

982543 

982588 

982633 

982678 

45 

1 

2723 

2769 

2814 

2859 

2904 

2949 

2994 

3040 

3085 

3130 

45 

2 

3175 

3220 

3265 

3310 

3356 

3401 

3446 

3491 

3536 

3581 

45 

3 

3626 

3671 

3716 

3762 

3807 

3852 

3897 

3942 

3987 

4032 

45 

4 

4077 

4122 

4167 

4212 

4257 

4302 

4347 

4392 

4437 

4482 

45 

965 

4527 

4572 

4617 

4662 

4707 

4752 

4797 

4842 

4887 

4932 

45 

6 

4977 

5022 

5067 

5112 

5157 

5202 

5247 

5292 

5337 

5382 

45 

7 

5426 

5471 

5516 

5561 

5605 

5651 

5696 

5741 

5786 

5830 

45 

8 

5875 

5920 

5965 

6010 

6055 

6100 

6144 

6189 

6234 

6279 

45 

9 

6324 

6369 

6413 

6458 

6503 

6548 

6593 

6637 

6682 

6727 

45 

970 

986772 

986817 

986861 

986906 

986951 

986996 

987040 

987085 

987130 

987175 

45 

1 

7219 

7264 

7309 

7353 

7398 

7443 

7488 

1 7532 

7577 

7622 

45 

2 

7666 

7711 

7756 

7800 

7845 

7890 

7934 

7979 

8024 

8068 

45 

3 

8113 

8157 

8202 

8247 

8291 

8336 

8381 

8425 

8470 

8514 

45 

4 

,8559 

8604 

8648 

8693 

8737 

8782 

8826 

8871 

8916 

8960 

45 

975 

’9005 

9049 

9094 

9138 

9183 

9227 

9272 

9316 

9361 

9405 

45 

6 

9450 

9494 

9539 

9583 

9628 

9672 

9717 

9761 

9806 

9850 

44 

7 

9395 

9939 

9983 

990028 

990072 

, 990117 

990161 

990206 

990250 

990294 

44 

8 

990339 

990383 

990428 

0472 

0516 

0561 i 

0605 

0650 

0694 I 

0738 

44 

9 

0783 

0827 

0871 

0916 

j 0960 

1004 

1049 

1093 

1137 I 

1182 

44 

980 

991226 

991270 

; 991315 

991359 

991403 

991448 

991492 

991536 

991580 ^ 

991625 

44 

1 

1669 

1713 

, 1758 

1802 

1846 

1890 1 

1935 

1979 ! 

2023 i 

2067 

44 

2 

2111 

2156 

2200 

2244 

2288 

2333 

2377 

2421 i 

2465 i 

2509 

44 

3 

2554 

2598 

2642 

2686 

2730 

2774 

2819 

2863 1 

2907 i 

2951 

44 

4 

2995 

3039 

3083 

: 3127 

3172 

3216 

3260 

3304 

3348 1 

3392 

44 

985 

3436 

3480 

3524 

3568 

3613 

3657 

3701 

3745 

3789 

3833 

44 

6 

3877 

3921 

3965 

1 4009 

4053 

4097 

4141 

4185 ; 

4229 ! 

4273 

44 

7 1 

4317 

4361 

4405 

4449 

4493 

4537 

4581 

4625 i 

4669 

4713 

44 

8 

4757 

4801 i 

4845 

4889 

4933 

4977 

5021 

5065 

5108 

5152 

44 

9 

5196 

5240 

5284 

5328 

5372 

5416 : 

5460 

5504 

5547 

5591 

44 

990 

995635 

995679 

995723 

995767 

995811 

995854 

995898 

995942 ■ 

995986 

996030 

44 

1 

6074 

6117 

6161 i 

6205 i 

6249 

6293 

6337 i 

6380 i 

6424 

6468 

44 

2 

6512 

6555 

6599 

6643 

6687 i 

6731 

6774 1 

6818 

6862 

6906 

44 

3 

6949 

6993 

7037 

7080 

7124 

7168 

7212 * 

7255 

7299 

7343 

44 

4 

7386 

7430 

7474 

7517 

7561 

7605 

7648 

7692 : 

7736 

7779 

44 

995 

7823 

7867 

7910 

7954 

7998 

8041 

8085 

8129 

8172 

8216 

44 

6 

8259 

8303 

8347 

8390 

8434 

8477 

8521 

8564 

8608 

8652 

44 

7 

8695 

8739 

8782 

8826 

8869 

8913 

8956 

9000 ; 

9043 

9087 

44 

8 

9131 

9174 

9218 

9261 

9305 

9348 

9392 

9435 i 

9479 

9522 

44 

9 

9565 

9609 

9652 

9696 

9739 

9783 

9826 

9870 : 

9913 

9957 

43 

N. 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

B. 
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APPENDIX Q 

Glossary of Symbols and Formulae 

For the ccnvenience of the reader, the more important symbols and 
formulae are listed below. The arrangement is alphabetical, and where 
Greek characters are shown, they are arranged according to the way they 
are pronounced in English. Occasionally a given symbol has more than 
one meaning, but the meaning intended is indicated by the context. 

In formulae having to do with multiple and partial correlation, various 
combinations of subscripts are possible with a, &, / 3 , d, R, r, acj cs, 

The general practice in this glossary is to give the formula for a specific 
combination, such as ri2 34 The reader can easily supply the correspond- 
ing formulae for ri3.24, ru 23, etc. 


Greek 

English 

A 

a 

Alpha 

B 

/3 

Beta 

r 

7 

Gamma 

A 

5 

Delta 

E 

€ 

Epsilon 

Z 

5 

Zeta 

H 


Eta 

e 

e 

Theta 


Geeek Alphabet 

Greek 

English 

I t 

Iota 

K /c 

Kappa 

A X 

Lambda 

M ju 

Mu 

N 

Nu 


Xi 

0 0 

Omicron 

n TT 

Pi 


Greek 

English 

p 

P 

Rho 

s 

(7 

Sigma 

T 

r 

Tau 

T 

V 

TJpsilon 

€> 

0 

Phi 

X 

X 

Cni 



Psi 

0 

CO 

Omega 


A = a constant in an orthogonal polynomial equation. 


A = “S Y sin : a constant i 


in a sine cosine curve. 


a - S2F ~ 2iF ■ 


5: a constant in a modified exponential curve 


( 6 " - 1 ) 2 - 
The value of — Yc when Z = 0. 
a: a constant in a Gompertz curve equation. 

log a = (S2-log 7 - Sx log 7) 

a: see a, i, c. d, e • ■ • (constants in a polynomial equation). 
k -Jo. , 


a = loge 


Vo 


a constant in a Pearl-Reed (logistic) curve. 
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918 GLOSSARY OF SYMBOLS AND FORMULAE 

a: the value of the dependent variable (Yc) in a polynofnial equation 

when the independent variable (Z) is zero. For a straight line equation^ 

'^Y 

a — Y — bX; or, when X = 0, a = 

See also a, 6, c, d, e • • • : constants in a polynomial equation. 

Af B: constants in a sine cosine curve. See also A and B. 

Aj Bj Cj D ' ^ ' : constants of an orthogonal polynonual equation. For 
general expression for these constants, see Orthogonal polynomial 
equation. 

a, &, c, a, e • • • : constants in a polynomial equation. 

Normal equations 

I. SF = An + &SZ + + eSZA 

II. SXF = aSX + 62X2 + cZX^ + dl^X^ + eZX^. 

III. 2X2F = a2X2 + 62X3 ^^X^ q. ^XX^ + e2X® 

IV. 2X3F = a2X3 + 62X^ + cXX^ + dXX^ + e2XL 

V. 2X^F = a2X^ + 62X^ + cZX^ + dXX^ + e2X3 


When X values are taken as deviations from their mean: 

I. 2F = Xa + c2X2 + e2X'^. 

II. 2XF = 62X2 + d2X4. 

III. 2 X 2 F = a 2 X 2 + c 2 X^ + e 2 X^ 

IV. 2 X 3 F = 62 X 4 + d 2 X 3 . 

V. 2 X 4 F - a 2 X 4 + c 2 X 3 + e 2 X 3 . 

o, 6, ifc: constants in a modified exponential equation, or in Gompertz or 
logistic equation employing modified exponential form. See also a, 6, 
and k. 

Gi 234 == Xi -- 6 i 2 34X2 ~ 613 24X3 — &14 23 X4: the computcd value of the 
dependent variable (Xci 234) when the independent variables (X2, X3, 
X4) are zero, 

gi. 234. 612 34, 613 23, 614.23: constants in multiple estimating equation. 


Normal equations 

I. 2X1 = Xai.234 + 612.342X2 + 613 24SX3 + 614.232X4. 

II. 2X1X2 = ai 2342X2 + 612.342x1 + 613 242X2X3 + 614 232X2X4. 

III. 2X1X3 = ai 2342X3 + 612 342X2X3 6 i 3 . 242 X| + 614 232X3X4, 

IV. 2X1X4 = Gi 2342X4 + 612,342X2X4 + 613.242X3X4 + 614 232X|. 


II. 

III. 

IV. 

AD = 


or 

'SxiX2 “ 6i2,s42a;| + 113,24^x2x3 + 6i4.2322;2^4. 


'Zxixs == 612.342x2x3 + 613.242x3 + 614.232x3x4. 

2xiX4 = 612.342X2X4 + 613.242X3X4 + 6i4.232x| 

2}xl 

j;^} where 2jx| means sum of deviations from mean, signs neg- 
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lected: average deviation or mean deviation, a measure of absolute 
dispersion. 

Aggregative index number formulae (subscripts to P’s and in terms 
below are for purposes of iflentification of recurring formulae) : 

Simple aggregative : 

p ^ 

Weighted aggregative (general form) : 

P Q 

'hpoq 'EqoP 

Base year weights: 1 

p ^pnqp ^ __ ^qnPo 

- Sg.p.' 

Given year weights : 2 

p '^Pn^n ^ ^qnPn 

Marshall-Edgeworth : 

p ^ '^Pniqo ~f" Qn) '^Pnqo,n 

'Zpoiqo + qnY ^PoQo.n 


Average year weights; 

'^P 71^0—71 


P = 


Spo^o- 


Keynes^ common factor: 

p — ^Pngp 

Spo^c ' 


^UdeaF' index number formula 

P 3 = \/Pj X P 2 Qs = '^Qi X Q 2 


i 


'■‘Vnqo 


S£n£n_ 
Spog. Spog„’ 


X 


J 2g„p. 

iSg.p. 


X 


^qnPn . 

Sg.p. 


3 


. . . ^ V 

Alienation coefficient: see 

(Ty 

a: a type of measure describing a frequency distribution. The a^s may 
be computed frora the t’s, as indicated below; or, in a similar manner 
from the p’s. 


0:1 


Ti _ TTI 


0 


0:2 


7r2 __ ^2 
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a normal curve. 


ViSi : a measure of relative skewness, as is zero for 


0:4 = ^ S 2 : Si measure of relative kurtosis. o ;4 is 3 for 

^ Vx| 

a normal curve. 

Arithmetic mean : see X. 

Average : see Measure of central tendency. 

Average deviation: see AD. 

Average of relatives index number formulae: 

Simple arithmetic mean: 



Simple harmonic mean: 



Simple geometric mean: 



Weighted arithmetic mean (general form) : 



where v is p X 


Arithmetic mean, base year value weights . o4 



(Same as 1.) 

Arithmetic mean, mixed weights 5 



(Same as 2 .) 
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Harmonic naean, given year value weights 6 

Pg = . . . . .. gPA _ Qg = Itir^Vn 


p -1 — p- --j 

\(Pnqn)^ ^ (qnP.)^ 


(Same as 2.) 

Harmonic mean, mixed weights 7 

Py = -■ • Q7 = — 


:[(!>.«.) I;] 

(Same as 1.) 

Weighted geometric means: 


^ (qnPo) ^ 
y.n 


'EnX 

^PoJ 


X 


(!)■ 


X 


where v ~ 

^^IdeaF^ index number formulae 


where v — qoP- 


X 


Ps = \/^4 X Pe 


or VPs X P 7 . 


12 


(Same as 3.) 


Qs = VQ 4 X Qe 
or VQs X Q 7 . 


P = 2) ^ constant in an orthogonal polynomial equation 

See also Xi. 


B = y cos : a constant in a si 


sme cosine curve. 


n/v. y T y 

5 = ^~^-L =r~- ^ constant in a modified exponential equation, the 

1^22 F — 2i 7 

ratio between successive first differences. 

, r/SslogF- S2logF . ^ 

0 = ^ 2 ;^ " ;[Qg Y X Si log F ‘ ^ ^ ^ Gompertz curve equation. 

- logg — a constant in a Pearl-Reed (logistic) curve. 

&: the slope of the line in a pol 3 momial equation. See a, fe, c, e • * 
(constants in a polynomial equation). For a straight Ime equation, 

, SXF - XSF ^ ^ ^ SXF 

h — •sr* 'cro xx j or, whcu X 0, 0 


SX^ --xsx^ 


sx^ 


6 = ^ (where cTp refers to the population) : h<jr gives a fiduciary limit for <rp 
%t/ = = r — : slope of estimating equation Xc == a' + &'F. 
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hyx = 2^ = r~: slope of estimating equation Yc — cl + hK. 


hi2 34 = ?'i2.34 a coefficient of partial estimation. See also ax 234 j 

0 ' S 2 134 

&12 34, f>i3 24; 6 i 4 23 (constauts in a multiple estimating equation). 

/ 3 : a criterion of frequency curve type. The / 3 ’s may be computed from 
the tt’s, as indicated below; or, in a similar manner, from the /x’s. 


TT^ 

= o:| = -|: a measure of relative skewness. 

TTg 

normal curve. 


Value of / 3 i is 0 for a 




TVs __ TTs 


a measure of relative skewness. 


o 

^2 = 04 = -4 = 


7 r 4 




0"" V^l "2 

182 is 3 for a normal curve. 


measure of relative kurtosis. 


Value of 


012 34 = f)i2,34 — : a beta coefficient. A measure of the individual import- 

CTl 

ance of one of three independent variables. 

Binomial: (p + q)^ for fitting to discrete data, symmetrical if p — g. 
otherwise skewed. 


Binomial theorem: 


(a + h)^ =: b + 


-252 

1 * 2*3 


-3 53 4. 


C = : coefficient of mean square contingency, a measure of cor- 

relation of qualitatively classified data. 

C: computed value. Used only as a subscript in this sense. 

C : cyclical movement. 

180 

C = 2Z2F : a constant m an orthogonal polynomial 


C2 


N(N^ - l)(N^ - 4 ) 
equation, 

Ss(2i\r - S |s|) 

N2 


, where 3 refers to the smaller of each pair of items, 


and when each series is expressed as deviations from its mean in terms 
of its average deviation: first moment correlation coefficient, 
c: a correction factor. Formula depends on what is being corrected, 
c: change in the slope of a polynomial equation. See a, 6, c, d, c • • ‘ 
(constants in a polynomial equation). 

1 

Camp-Meidell inequality: P< where P is the proportion of 

2.251-1 
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items beyo^jd any distance x on both sides of the mean. Applies to 
uni-modal distribution if mode is within la* of the mean. 

%: same as Sk^. x is not used in this text with this meaning, 

X: Vx^. 

X^ : a measure of the discrepancy between observed values and theoretical 
values. 

y2 ~ 2 ~ « 

^ fc 


(a - 2 bY 

— \ 2 _Z_, where a = number of occurrences of first category: 

^AT 

Q 

b = number of occurrences of second category; p == probability of 
obtaining an occurrence of first category; q = probability of obtain- 
ing an occurrence of second category. 


X 


2 


N(r^ 

0*1 


, where (Xp refers to the variance in the population. 


Compound interest curve: see Exponential equation. 
Correlation, coefiicient of : see r. 

Correlation, first moment coefiicient: see C2. 
Correlation, index of: see p. 

Correlation, multiple: see i?i.234* 

Correlation, part: see 12^34. 

Correlation, partial: see ri2.34. 

Correlation ratio: see rj. 

Cyclical-irregular movements: C X 

D: a decile. There are nine deciles, Di • • • D9. 

D: a difference between paired values. 


9 gQQ 

D = : a constant in as- orthogonal 

polynomial equation. 

d: a constant in a polynomial equation. See c, d, e * ‘ ' (constants in 
a polynomial equation). 

= X ~ a deviation from an assumed mean. 


d' = 



a deviation from an assumed mean in units of class intervals 


df2 ; 


! 34 = : a coefiicient of separate determination. 

A: a finite difference. 

is a first difference; A^ is a second difference. 



924 


GLOSSARY OF SYMBOLS AND FORMULAE 


Ai, as used in computation of the mode, is the arithmetic difference be- 
tween the frequency of the modal class and the frequency of the pre- 
ceding class; A 2 IS the arithmetic difference between the frequency of 
the modal class and the frequency of the following class. 
Deseasonalized data: T X C X I- 
Determination, coefficient of multiple : see Ri 234 . 

Determination, coefficient of partial: see ri 2 34 . 

Determination, coefficient of separate: see di 2 34 . 

Determination, coefficient of simple : see 
Determination, index of, see p^. 

Determination, ratio of, see yf. 

Dispersion: 

For absolute dispersion see cr; '7r2 = AD; Q. 

For relative dispersion see F. 

e: a constant in a polynomial equation. See a, 6, c, d, e • • * (constants 
in a polynomial equation). 

6 = 2.71828: the base of the Naperian, or natural, logarithmic system. 
The Umit of (1 + i)". = 2.30259 logio Z.) 


7} (correlation ratio) : see 

mP *1 ^ \ 

( 7 k - F)2j S 7j - FZr 


772 = 


2(7 — Fp — "" — ^7 2 _ yXY — ‘ determination, 

A measure of relationship when data are grouped along X-axis and 
line of means is taken as estimating line. 


(N-^l)^(m^ 1) 




N — m 


: estimated population value of yf, 


Explained sum of squares: 

For simple correlation (linear and non-linear) : 

27g = a27 + 62X7 + 02X^7 + + • • * . 

For simple correlation grouped data in units of class intervals (devia- 
tions are from assumed means) : 

S/. d'y + 62 /d', d', + • • • . 

For multiple correlation: 

SXc1,234 = GH.2342Xi + 6i2.342XlX2 +• 6l3.24SXlX3 + 6l4.232XlX4 
Explained variance: see 

Explained variation (see also Explained sum of squares): 

For simple correlation: 

- 2 (7^ - F)2 S7| ^ F27. 
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y^o 


For simffle correlation when data are in deviation form: 

S 2 /I = l)2/xy + cZx^y + d'Zx^y + • • • . 

For simple correlation, grouped data (see also Variation between col- 
umns): 

SM = hfy (d'y,r - 

For simple correlation, grouped data in units of class intervals: 

= s/. 

For multiple correlation: 

Sa:oi.234 = SYci 234 — JiEXi. 

For multiple correlation when data are in deviation form: 

234 = &I2 34 '^XiX 2 + 5i3 24 ^XiXz + 614 23 ^XiX4:. 

Exponential equation: 

Yc = or log Y = log a + X log fe. 

Compound interest curve is often written Pn == Po(l + 

^2 

F == I— where a\ is larger variance. 

^2 

5-2 «• o 

F = ^ 

^ll 234 

\ Ni 

jPi [2] = — — ; the normal curve function. 

\<r/ crv27r 

Charlier series. Cumulative frequencies for Gram-Charlier second 
approximation curve may be obtained by integration: 

f: number of observations in a class. S/ = X. 
fc: a computed frequency. 

Factoring, formula for factoring the quadratic equation, a + fcX + cX^— 0: 
^ — 6 d= VP — 4ac 

^ 2c 

Frequency curve type, criteria of: see as; a^; /C 2 . 

(t =5 VXi * X 2 * Xs " • • Xjv-: geometric mean, a measure of central tend- 
ency. 
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Geometric mean: see (?. 
Gompertz curve equation: 
F(7 = j or 
log Yc == log & + log a. 


H = — : harmonic mean, a measure of central tend- 

+ Y" + * ‘ ' + -y-' 

ency. Alternate forms of this expression are shown in Chapter IX. 
Harmonic mean: see H, 

I : irregular movements. 
i = h — h* the class interval. 

index number formulae: see Aggregative index number formulae; Average 
of relatives index number formulae. 

Individual importance of independent variables: see ^1234; ^12.34; i8i2.34; 

^12 34 } 12^34. 

Infinity: 00 

I.Q. : Intelligence quotient. 

K: column. 

k: number of samples (used in connection with criterion of likelihood). 




SiF- 


22F - SiF' 


F) 


^ 1 r SiFSsY -- (S2F)2 1 ,, , , , 

“ - ^ ; the asymptote of a modified exponential 


equation. 

k: the asymptote of a Gompertz curve equation. 


I 

logk ~ - 
n 


Si log F 



= 1 r Si log F S3 log F - (Sa log F)? 1 . 
n LSi log F + S3 log F - 2S2 log Fj 


h - I : the asymptote of a Tearl-Reed (logistic) 

ym — yt v & / 

equation. 


k = 


O' y 


coefficient of alienation. 

<r2 T2i/2 
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coefficient of non-determination. 

22/2 


[r-2 + F = 1.] 


^l{02 "i" 3)2 i! xi. j i 

= 4 ( 4 ^^ _ _ 3 ^^ _ 6) - ^ measure of the departure 

from normal of a frequency distribution. For a normal distribution, 
the value of k 2 = 0. 

Kurtosis: see q; 4 ; 13 2 • 


L = 


X X • ' * X of 

|(fff + erf + ••• + a-i) 


: criterion of likelihood, the ratio of the geo- 


metric mean of several standard deviations to their arithmetic mean. 
Where samples vary in size, weighted means should be used. See 
Chapter XIII. 

1: the limit of a class. 

h is the lower limit; h is the upper limit. 

Lag, distribution of, by weighted moving average placed opposite 1 . 
Weight formula: 


Xi + 2X2 + 3X3 + 4X4 + - * + NX^ 


Likelihood, criterion of : see L. 

Log: logarithm. 

Logarithmic normal curve: a normal curve using log X values. See Normal 
curve, and Chapter XI. 

Logistic curve (see also Pearl-Reed cinrve) : 

i 


m: number of constants in an equation, number of columns in. classified 
data, or number of strata in a stratified sample. 
m: the exponent of a binomial expression. 

Mean: see X, Gj H. 

Mean deviation: see AD. 

Mean square contingency, coefficient of: see (7. 

Measure of central tendency: see X, Med, Mo, (?, £f. 

Med: median, a measure of central tendency. Med = 

Median: see Med. 


’“'>-‘‘+( 57 ^)’’*'"’ 

= Z — <r Sk^, also 

= X — 3(X — Med) [an empirical approximation]: mode, a measure 
of central tendency. 



928 


GLOSSARY OF SYMBOLS AND FORMULAE 
Mode: see Mo. 

Modified exponential equation: Fc = + oiF* 

Modified polynomial equation: Yc == a + hX^ + * * * , 

Moment: see v, tt, /x. 

jx : moment around the mean with Sheppard^s correction. (The and 
TT^s are in units of class intervals.) 

= TTi == 0. 

JX2 ^ 
fJtS ~ TTs* 

/i4 = 7r4 — 7r2 + 

Multiple correlation coefficient: i?i.234. See also i?i.234. 

Multiple determination; coefficient of : see Ri 234- 
Multiple estimating equation: 

Xci 2C4 = Cti 234 + &12.34Y2 + &13 24^3 + &14 23^4, Or 
Xci 234 =* hl 2 34 X 2 + 613.24X3 + 6x4 23X4. 

See also ai.234, 612.34, 613.24, 614.23 (constants in a multiple estimating 
equation). 

N: number of items in a sample. In a frequency distribution N = S/. 
Nk: number of items in a column. 

Ns: number of items in a stratum of a sample. 

N': number of items in k samples (used in connection with criterion of 
likelihood). 

n: current or given year of an index. Used only as a subscript in this 
manner. 

n: number of degrees of freedom. 

n: number of observations in a group, when partial totals are used in fit- 
ting a modified exponential curve, a Gompertz curve, or a logistic curve 
of modified exponential form. Also, number of items between selected 
points in fitting a Pearl-Reed (logistic) curve. 
n: number of years or periods in the Compound interest expression Pn “ 
P 0(1 + r)”. 

Non-determination coefficient: see 
Normal curve; 


Yc 


Ni . 


Normal curve with adjustment for skewness (first two terms of Gram- 
Charlier series) : 



v: a moment about an assumed mean. 
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Pi = : Iferst moment about an assumed mean. 

j ,2 = — ; second moment about an assumed mean. 

Vs = - -jy, — : third moment about an assumed mean. 

I fourth moment about an assumed mean. 

0: base year of an index. Used only as a subscript in this manner. 
Original data: T X C X S X I. 

Orthogonal polynomial equation: 

Ya = A + BX1 + CX2 + DXz + • • • . 

-^(r + 1 ) AlAr 4 ( 47-2 1) ^(r — 1 )* 

Coefficient of Xr ~ ' i) • • • (jsf^ — r^) 

N is the number of years or months and r is the degree of the polyno- 
mial. 

P: a percentile. There are 99 percentiles, Pi • * • P99. 

P: population (when used as a subscript). Also, the number of items in 
the population. 

P,t = Po (1 + r)”: population at end of period. 

Po"- population at beginning of period. 

P: price index number. See Aggregative index number formulae; Aver- 
age of relatives index number formulae. 

Pn: price index number for a given year. (Pse: price index number for 
1936, and similarly for other years.) 

P: probability of obtaining a deviation of a given magnitude, or the ratio 
of the area of the tail (or tails) of a frequency distribution to the entire 
area under consideration. 

P 5 : munber of items in a stratum of the population, 
p: a percentage or a proportion. 


X(r + 1) = XiXr 


Coefficient of Xr ~ 




= the 


iVi N2 


estimate of the percentage in the population 


made by averaging the percentages in the two samples, 
p: the probability of obtaining a success. 


p == — for a binomial expression, where m is the exponent, or the 


ber of possible happeninjjs minus 1. 
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p\ price of a commodity, 

Pn. price of a commodity in given year, 
price of a commodity in base year. 

Parabolic equation: 

Yc = or log Y = log a + i> log X, 

Part correlation coefficient: see 12^34. 

Partial correlation coefficient: ri2 34. See also ri2 34. 

Partial determination, coefficient of: see 34. 

PE = . 6745 cr: probable error. 

PE^ — .6745 cr^: probable error of the arithmetic mean. 
Pearl-Reed curve: a type of logistic curve (see also Logistic curve). 
k 


Yc- 


I + e ® + 
h 


(symmetrical curve). 

(asymmetrical curve). 


I ^ ^ a+bX + cX2 

Periodic curve: see Sine cosine curve. 

tt: 3,14159 (circumference of a circle is 27 r times the radius). 

TT : a moment about the mean. 

IfX 

TTi = = 0: jSrst moment about the mean. 


2 2^2 

T2 = cr2 = — = V2 


vf: second moment about the mean; variance; 
a measure of absolute dispersion. 

2^3 

TTs — — V3 ~ 3 z<'iJ' 2 + third moment about the mean, a meas- 

ure of absolute skewness. 


r 4 = 


N 


V 4 — ^vivz + ^vlv2 — 3 z^f: the fourth moment around the 

mean, a measure of absolute kurtosis. 

Polynomial equation (see also a, 6, c, d, e • • • , constants in a polynomial 
equation) : 

Yc-=a + bX + cX^ + dX^ + eX^ + - 
Price index number formulae: see Aggregative index number formulae; 
Average of relatives index number formulae; Purchasing power index 
number formula. 

Price relative: — • 

Po 

Purchasing power index number formula: 


Puprchasing power = 






, where m =» i. 

P 


(■Reciprocal of har- 


monic mean of price relatives weighted by base year values.) 
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Q: quantity videx number. See Aggregative index number formulae; 
Average of relatives index number formulae. 

Qn: quantity index number for a given year. (Qse- quantity index num- 
ber in 1936, and similarly for other years.) 

Q: a quartile. There are three quartiles: Qi, Q 2 , Qs- Q 2 is the median. 


Q = 


Qs — Qi 


: quartile deviation or semi-mterquartile range, a measure of 


absolute dispersion. 

q = 1 -- p: a> percentage or proportion; also, the probability of obtaining 
a failure. 

q' = 1 — p'. See also p'. 

q: the quantity of a commodity (used m connection with index numbers). 
qci the quantity common to two or several periods. 
qn: the quantity of a commodity in the given year. 
qoi the quantity of a commodity in base year. 
qo + qn: the total quantity in two years. 


qo,n = the average quantity in two years. 

qo—n** the average quantity in several years. 

Quantity index number formulae: see Aggregative index number formulae; 
Average of relatives index number formulae. 


Quantity relative: — • 

Quartile deviation: see Q. 

Ri, 2 Z 4 .: correlation of Xi with Xci. 234 , coefficient of multiple correlation. 
See also i2i.234* 

JK ?.23 = — — . . a coefficient of multiple determina- 

1 — r23 

tion. 

p2 ^ Sa:^1234 ^ SZ^1234 " XiSXj 

Sxf SZf - XiSZi 

_ 1 _ 3-t _ 1 _ 234 

al Sxf 

= 1 — [(1 — rf4)(l — rf3 4)(l — J'f2.34)]: a coefficient of multiple 
deterinination. 

Bf.234 = ^ population estimate of iEi. 234 . 

r — rate of change in expression Pn = Po(l + t’)”- 

r: coefficient of correlation, a measure of closeness of relationship between 
two variables. See also r®. 
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^OT ungrouped data; 


s(^- • 

~h . O';/ ■ Cy) 

*'* ' cfx 2a;^ 2^—^^ 


= = J (^^Vf _ , / ft - fe - 

N o-./' Ar(T.(r„ ^ (2x^)(2y^) ~ ^ 

= NSZY - (SX)(SF) 

V[N2;X^ - (SZ)2] [iV2F^ - (27)2]' 

For grouped data, this last expression becomes 

zs/d^; - c2fj')C2fX) 

viN^fx {d'r - i2fxnmfu «)^ - ( 2 //^;)^] 

2/dX _ 2/xdx Z/ydy 


N 


N N 


V 


N 




useful when esti« 


mating equation is to be obtained by use of & = r — , since de- 

(Tjt 

nominator gives cr's (in class intervals). 

For other grouped data formulae, see r^. 
r^: coefficient of determination, a measure of closeness of relationship be- 


tween two variables. See also r. 
For ungrouped data: 


,2 -- 


- 1 




^ = 1 
(ri 


S7I-FSF («2F + &SZF)- 
SF2 - FSF "" 


(2F)2 

N 


M = i 

Sy2 


SF2 


SF2 - 
-SFS 


(SF)2 

N 


SF2 - FSF 

For grouped data: 

2 _ 2 {yhy 2A (d;,)2 - (S/^ d;)2 -5- N 

^ivr S/, (d;)2 - (S/, 02 ^ N 

_zs/,(0)2^ (S/,02 

ZS/„ (O" - (2/, 0"‘ 

For further explanation of symbols, see Variance; Total variation; Ex- 
plained variation; Unexplained variation; Explained sum of squares. 
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- 1) - 1 - r2(jy - 1) - (ot - 1) 


;r2 


N -2 


N 


m 


: estimate of pop- 


ulation value of 

ri2 — {riz){T2z) 


T12.Z = 


^2 -, Vl 


- : a coefficient of partial correlation. See also 


'23 


Vl - ria - 

ri 2 34 and rfa 34 . 

5 "i 2 34 : correlation of a;^ 2.34 with 0 : 51 . 34 . Coefficient of partial correlation. 
See also rf 2 34 . 


^ 12.34 


^ 4 . 




Cl 234 


So:." 


'Cl 34 


34 


X . 0“51 234 1. 0'52 134 

= 012.34 — , or 012 34 X 


0"52 134 


0’51 234 


“ V Z>12 34 * 621.34 

ri2.3 ^ (^14 3) (^24 3) 


^12 345 


•^1-^4 3^! 


?"12 345 


or 


ri 2 4 - (ri3 4)(r23.4) 


Vl - 






' 24 3 ' 13 4 ' 23 4 

(m-1) — [rim 345 ■ • ■ (m- l)][^2m 345 > . « (w - 1)]. 


2 

5"12 34 = 


"Vf 345 • • • (m — 1) *Vl- ^’27n 345 • • • (m — 1) 

general formula for the coefficient of partial correlation. 

_ '2fXci 234 — • '^Xci 34 __ ^Xpi 2 34 — Sxci 34 


2a;|i 34 

2 .Xci 234 SXci .34 


^x\ 34 


SZf - 
also ri 2 34 * 

H 2 1 

1 “ T 12 .ZA ~ 


2X1, ,34 


jRl 234 


: coefficient of partial determination. See 


1 — JSl.34 


^12.34 


ri2 34 (X - m + 1) - 1 , 

N -- m 


: population estimate of rf 2 . 34 . 

6i3 24 Xz — 6i 4 23Z4), coefficient of 


12 ^ 34 : correlation of Z 2 with (Zi 
part correlation. 

Range of a sine cosine curve: 2 \/A- -f 

Ranked data, correlation of: see p, Spearman^s formula. 

6SZ)^ 

P ^ 1 _ X) * Spearman^s formula for correlation of ranked data 

p (index of non-linear correlation) : see p^. 

p^: index of determination. A measure of non-linear correlation. For- 
mulae are the same as those for which are based upon the explanation 
of variance or variation. Estimating equation may have more con- 
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stants, or may be for logarithms or reciprocals of X or V, or may be 
any combination of these conditions. If logarithms of X and Y are 
used, the S3Tnbols are written piogrbgz. and similarly for other trans- 
formations. 

1 ^ = P SK .. T . p iHl 1); estimated population value of p^. 

0 *; N — m 

S: deviation from a line of estimation. Used only as a subscript in this 
sense. Also used as a subscript to refer to a particular stratum in a 
stratified sample. 

Si seasonal movement. 

Scatter ratio : anti-log of aiog y^. 

Seasonally adjusted data: see Deseasonalized data. 

Semi-interquartile range: see Q, 

Separate determination coefficient: see di 2 34. 

Sheppard^s method of unlike signs = cos U 1.8°, where U is percentage of 
cases of unlike sign: a measure of correlation of qualitatively classified 
data. 

S : summation sign. 

SX; SF : sum of all the X or F values. 

2 : summation of items 1 through Nk^ 

m 

2: summation of columns 1 through m. 

1 

1^2 7) = SF. 

2i, 22, 23: partial totals. 

2F^: sum of squares of F values. 

2F?: see Explained sum of squares, 

2?/2: see Total variation. 

22/1: see Explained variation, 

22/1 : see Unexplained variation. 

cr: standard deviation, a measure of absolute dispersion. 

For ungrouped data: 

y N y N \N ) 

For grouped data: 
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I 

^ ~ __ estimated standard deviation in the pop 

variance, a measure of absolute dispersion. 
N 


estimated standard deviation in the population. See also o"^. 


N ~1~ N - 
For ungrouped data: 


cr^: estimated variance m the population. 


For grouped data: 


(SX)2 
N(N - 1 ) 

N(N - 1 ) 


.. (SA)2 

N 

N -1 


SY2 - JSX 
N-1 


^ - 1 NiN - 1), 


Xxl + 2x1 


: estimate of population var- 


-2 2xi + 2x2 2xi -j- 2x2 j. r 1 i.* 

i7i + 2 = TiT TT = ; : estimate of population var- 

(iVi - 1) + (N 2 - 1) ni + n2 ^ ^ 

iance made by averaging the variances of the two samples. Same as 

estimated population variance within columns when there are two 

columns. 

When 0*2 is estimated from several samples, 

-2 _ iV iCl + iV'2Cr2 + ‘ ‘ * + 

(N1+N2 + ' -+N,) -k 

(To, — ViVpg .-standard error of number of occurrences. 

^ci 234 - standard deviation of explained or computed values, multiple cor- 
relation. See also Oqi 234 . 

2x^ 

^ci 234 - ~ : explained variance, multiple correlation. See also ex- 

plained variation. 


2xci 234 


-2 

Cl-234 - m - 1' 

tiple correlation. 

1 - J 72 
“ ' /Af— — = 

\'N — m 
mation. 


: explained variance based on degrees of freedom, mul- 


standard error of the correlation ratio. A rough approxi- 


^(X — 7j^)^ — (1 — + 1 : standard error of 


An anoroximation. 
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^logi standard deviation of the logarithms of a series. May^ be computed 
by use of the expression criog = .7413 (log Qs — log Qi) for fitting a 
logarithmic normal curve. 

^Med == 1.2533 cr^: standard error of the median. 

CTpi standard deviation in the population. 

cTpi variance in the population. 

cTp = standard error of a percentage. 

- 2^2 o'p -f <Tp : standard error of the difference between two 
percentages. 




P] -f 2gl-f-2 , Pl + 2gl +2 

A^i ^ N2 




Pl + 2gi-h2' 


A^l -j- 1^2 


NIN 2 * 

the standard error of the difference between two percentages when one 
estimate of the percentage in the population is made from the two 
samples. 

(XtI standard error of the coefficient of correlation. 


ar 


VN 


, a rough measure unless vp is small and N is large. 


(Tt - 


Vn 


when hypothesis is rp = 0. 


1 — 

(Tt = ' ■■ a rough measure of the samphng error of r. 




ViV - 2 
1 - Rl 234 


ViV 


m 


: standard error of coefficient of multiple correlation 


A rough approximation. 


^^12 34 * error of coefficient of partial correlation. 

1 - 

cr- = — ===£ 4 l~=ri ^ rough measure of the sampling error of ri2 34 

VA-m+1 

unless rpi2 34 is small and N is large. 

1 


VfV - m + 1 


,, when hypothesis is rpi2 34 = 0 . 


1 _ 2 

——JIM} a rough measure of the samplmg error of ^2.94. 
vN — m 


standard error of the index of correlation. A rough ap- 


proximation. 



O' 51.234 = 




GLOSSARY OF SYMBOLS AND FORMULAE 937 
o’si 234 : stand^^rd error of estimate, multiple correlation. See also crli 234 - 

_ Sxf — 21xci 234 _ SX? — 2Xci 234 

X ~ X “ N 

= (1 ~ y^i4)(l — ns 4 ) (1 — ^12 34 ) 

N 

= <7i (1 — ri 4 )(l ~ ri 3 . 4 ) (1 ™ ^ 1234 ): unexplained variance, mul- 
tiple correlation. 


0*11 234 


2a:|i 


234 


N — m 
multiple correlation. 

a 


: estimate of unexplamed variance in the population, 


cr<r = 


V 2 N 


== .7071068 standard error of the standard deviation. 


If kurtosis is present = 
in the population. 


V 2 N 




i 82 - 3 


, where ^2 is the value 


: standard error of the difference between two 
standard deviations (useful when Ni and N 2 are large). 

m 

s Ns {Xs - xy 

O'©! strata means = ^ : estimated population variance of strata 


N - 1 


means. 


s Ns {Xs - ly 


of a stratified sample ^ 

mean of a stratified sample. 

V 


: the sampling variance of the 




standard error of coefficient of variation. 


^ V2N' 

<Tvi- Vs — Vcr|^ + cr^ 2 - standard error of the difference between two co- 
efficients of variation. 

O'!, cr|^ cr|^ : similar to corresponding expressions for y. 

, 'Lxl XX\ - JiSZi . , ^ . 1 

CT^ = — TT • variance of the dependent variable in mul- 

N N 

tiple correlation. 

cr- = or approximately standard error of the mean. 

^ VN VN VN — 1 

cr^ : standard error of the mean of differences between paired items. 

An 

= 5 . = VV| +a$ : standard error of the difference between two means, 

2Lt% At 
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0'_ 


>f+2_ 

1 ^1 *4- 2 

r Ni 

'■ N2 


-V; 


(-Yi + N^Y'Lxi + S:ri 


NiN2[{Ni - 1) + {N2 - 1)] 


'xl) = J (Yi + N 2 )( 2 xI + M) . 
- 1)1 ^ N1N2 (ni + 712) 


standard error of the difference between two means when one estimate 
of population variance is made from two samples. This expression is 
the same as that for cr^ p when Ni = N 2 - 

erg: variance, or total variance, a measure of absolute dispersion. See also 
Total variation = 


For ungrouped data: 

, Ey^ E(Y - F)2 EY^ fEYV 

N ~ N N \N )' 


For grouped data: 




N 

\ N )i 




= + (j 


2 

2^5- 


ag 


N - 1 


: estimate of total variance in the population of F, the de- 


pendent variable. Wnen only one variable is under consideration, (x 
usually has no subscript and the deviations are indicated by x. See 5-. 
erg^: explained variance. See also or explained variation, and EY% 
or explained sum of squares. 


For ungrouped data: 

Syg S(Fc - 7Y 
N N ' 


For grouped data: 

a-l = {2 f Mycy 
L -Y 




’V 2 

O'! = irrrT* variance based on degrees of freedom. 

(Ty^i standard error of estimate. See also crg^. 

For ungrouped data: 

=:JWl_J Wy-rc)^ JZY^-Sn j2Y^-(aZr + b2XF) 
M N ^ N ~M N ~1 w 
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For grouped data: 

^ > N 

^ JW. {dyY ~ S/y {dy)^ 

1 N 

^Vs == (TyV 1 - r^. 

<7^^: unexplained variance. See also Unexplained variation = S?/|. 

For ungrouped data: 

, 22/1 2(7 - 7c)2 272 - 271 27^ - (a27 + 62X7) 

N ^ N N ^ N 


For grouped data: 

= .2 p/(^0^~2/(yc)^ 

2A(d;)2 2/, (d;;2-| 

N 



5-2 


22/1 

N — m 


: estimate of unexplained variance in the population. 


(Tz = : standard error of Z. 

V X — m — 1 

Sine cosine curve equation: 7c = 7 ~ ^4 sin X^ + 5 cos xj . 

Skewed curve: see Binomial; Logarithmic normal curve; Normal curve 
with adjustment for skewness. 

X - Mo , , 3(J - Med) , , , 

g]j ^ Qj. roughly ^ : a measure of relative skewness. 

Ska “ - V •* a measure of relative skewness. 

^ 2(5/32 - 6/3i - 9) 

Skiog = — ^ ^ logarithmic measure of relative 

log Qs - log Qi 

skewness. 


Skj 


PjO + Pgo — 2P5O. 


P90 Pio 
percentiles. 


a measure of relative skewness based upon the 


Skc = a measure of relative skewness based upon the 

{Qs — Qi) 

quartiles. 
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Skewness: 

For absolute skewness, see tts. 

For relative skewness, see as = V /3i, /3i, Sk, Sk^, Skiog, Skp, Sk^. 
Spearman^s formula for correlation of ranked data: see p. 

Standard deviation : see cr and o'. 

Standard error of estimate: see dy^, ctsi 234, and 0-51 234- 

Standard error of a statistical measure: see cr^, cr,2 __ rs, 0*^, dr 

^Ri.234:^ ^^12 347 <^<73 cr Cl- erg; (Ty, 0*7^ - - Xg; 

Standard score : 

d 

Straight line equation: Yc = a + hX. 

T: periodicity of a time series m X units. 

T: secular trend. 

t: ratio of a statistical measure which is distributed normally around a 
mean of zero to an estimate of the standard error of that measure based 
on the number of degrees of freedom present. 

_ X -Xp X-Xp 

^ ~ ^ VW’ VN -1 

Xi -X2 Xi- X2 

, or f 

C^Xi - X 2 Xi - ^2 

^ Vi - _ Vat - 2 

Vx — m Vl — 


^"12 34 Vn ~ m 

Vl — rf2 34 


Tchebycheff^s inequality: P < 



where P is the proportion of items 


beyond any distance x on both sides of the mean. AppUes to any series 
of data. 

Total variance: see cf. 

Total variation: a measure of absolute dispersion. + S?/|. 


For ungrouped data: 

= SF2 _ (2p! ^ 2F2 - FS7. 


For grouped data: 
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For grouped data in units of class intervals: 

My'y = s/. {d;y - 


941 


For multiple and partial correlation: 

Xxl = - XiSYi. 

Sx? = 234 + 234. 

Unexplained variance: see 
Unexplained variation : 

For simple correlation, ungrouped data: 

= S(Y - YcY = SF2 - SrI, or - ^yl 


For grouped data: 

2/2/1 = i^Myr - my'cn = 

For grouped data in units of class intervals: 

my'sy - mvv - my'cY - WvY - 

For multiple and partial correlation: 

Sxli 234 = Sxi — 2a:ci 234 = szf- szSi . 234 . See also Explained 
sum of squares: SF^; '^fyidy^^] SX|i . 234 . See also Variation within 
columns. 

F = ^: coefficient of variation, a measure of relative dispersion. 


V = 5^^: relative aggregate value. 

V — pq: value of a commodity. 

Vn = Pnqn: valuo of a commodity in given year. 

Vo = Poqo* value of a commodity in base year. 

Variance: second moment about the mean, or square of standard deviation. 
See cr|, crl^ = total variance. 


See crg^, cr|i ,234 = explained variance. 
See asi 234 = unexplained variance. 


Variance, population estimate: 

SF2 

Total 


FS7 2F" - [(SF)^ 4 ^ N] 


N - 1 


or 


N - 1 



or 



Between columns 
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ml iVjj \ 

-X(Yjcl^rj 

Within columns ... . or 

N — m 

SF2 - [idV)" ^ Wk ] 

N — m 


Variation, coeflSicient of: see V. 

Variation: sum of squared deviations. See Total variation; Explained 
variation; Unexplained variation; Variation between columns; Variation 
within columns. 

Variation between columns or groups (explained variation) : 

Ungrouped as to Y values: 



For data grouped as to Y values, deviations in units of class intervals: 

r/Nrr \ 2 - 


s ^ 


\ 


y\2 m 

i / 1 


L“1^J 


N ' 


Variation within columns or groups (unexplained variation) : 
Ungrouped as to Y values: 



(Same as o'? + 2 when there are two columns.) 

For data grouped as to Y values, deviations in units of class intervals: 


iftf - ■ = ^fvid'y)^ - s 

1 Li \ ^ / J 1 — ^ 


L Nk 


X: a variable, usually the independent variable; also the mid-value of a 
class in a frequency distribution. 

Zi : the dependent variable when there are more than two variables. Used 
in multiple and partial correlation. 

Xi, X2, X 3 • • • Xjy: different values or observations of variable X. (Oc« 
casionally Xo, X^, Xo ’ ' ' Xjf are used for this concept, as in Appendix 
B, sections XII-1 and XII-2.) 
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Xi, Xz, X3, etft.: independent variables in an orthogonal polynomial equa- 
tion. F or general expression for Xr, see Orthogonal polynomial equation. 


Xi 

Xz 


= X. 

^2 Y2 - 1 


mean, a measure of central tend 


„ _ „3 3N2-7„ 

A 3 — Ai — 20 — Ai. 

X2, X3, Xi, etc. : independent variables in multiple estimating equation 
Y -I- 

_ency. (Also the best estimate of Xp.) 

Xd '■ mean of differences between paired items. 

: an assumed mean. 

Xiog: mean of logarithms of X values. May be computed by use of the 

V log Qi + log Qs + 1.2554 log Q 2 r /hxa* i 
expression Xiog = — — — ^3 2554 fitting a loga^ 

rithmic normal curve. 

Xp: population mean. 

X 3 : mean of stratum in a stratified sample, 
a: == X — X : a deviation from the mean. 

X-X 


X = 


■ : a deviation from the mean in units of class intervals. 


^0, xi, X 2 : selected equidistant points, used in fitting Pearl-Reed (logistic) 
curve. 


X • * 

deviation from mean in units of standard deviations. Used also for 
<r 

deviation of any computed measure from its population value, from a 
hypothetical value or from another computed sample value, m units of 
standard errors of that measure. 

Y: a variable, usually the dependent variable. 

Fc: a computed F value. 

Fq: ordinate at the mean. Maximum ordinate in case of normal curve, 
Fk: the mean of a particular column. 

2/ - F ^ F. 

/ F- F 

2/0, 2/1, y 2 - selected F values, used in fitting Pearl-Reed (logistic) curve. 

Z = iPog. (1 -t- r) - log, (1 = 1-15129 logic 

a transformation of r made in order to obtain an approximately normal 
sampling distribution. 
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z = 
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log, O'! - log, 02 = log, 

2.30259 (logio 01 — logio5'2) = 2.30259 logio 

^2 


|(l 0 ge 01 - log, 02 ) = i log, 

-2 

L15129(logio^f - logio o-i) = 1.15129 logio 

"“2 "2 
1.15129 logio = 1-15129 logio 

^Sl 234 
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A 

Adding machine, nse of, 865-866 
Aggregative price mdex numbers: 
simple, 588-590 
weighted: 

approximate weights, 595-596 
average quantities, 593 
base period quantities, 592 
common factor, 593-594 
given year quantities, 593 
group weights, 603n-604n 
ideal, 594-595 
Marshall-Edgeworth, 593 
Aggregative quantity index numbers. 607- 
609 

Alienation, coefficient of, 663u 
Alphas, 256, 259-261 

American Institute of Public Opimon, sam- 
pling method of, 29-30 
American Telephone and Telegraph Com- 
pany Index of Industrial Activity, 643 
Amplitude ratio, 522 
moving, 524 

Analysis of variance (see Variance, analysis 
of) 

Arithmetic mean (see also Modified mean) 
graphic location, frequency curve, 215, 216 
of averages, 206-207 
of grouped data: 

long method, 197-200 
open-end classes, 203-204 
short methods, 197-202 
unequal class intervals, 202- 204 
of percentages, 205-206 
of ungrouped data, 194-195 
properties of, 195-197 
Arithmetic progression, 101 
Arrangement: 
alphabetical, 58 
customary, 60-61 
geographical, 58-60 
historical, 60 
progressive, 61 
Array, 165-168 
Ascher, Leonard, 577 
Ashby, Lyle W., 650n 
Asymmetrical curve {see Skewed curve) 
Asymmetry (see Skewness) 

Average (see Central tendency; Belatives, 
price, types of average) 


Average deviation: 

computation of, 238-239 
of cyclical averages, 566-570 
used in index number construction, 641 
Axes, 71-72 

Ayres’ Index of State School Systems, 645-646 
Ayres, Leonard 644, 645 

B 

Babson, Roger, 814-815 
Banerjee, Sudhir Kumar, 879 
Bar chart’ 

compared with simple curve, 128-129 
complex types, 128-131 
component part, 133-137 
frequency distribution column diagram, 77 
simple, 126-127 
Barlow’s Tables, 871 

Barron’s Index of Production and Trade, 639 

Base line, 81-85 

Beta coefficients, 773n-774n 

Betas: 

as criteria of normal curve, 284, 286 
computation of, 256, 259-260 
Bias: 

in sample, 32 
in statistician, 78 
Binomial curve: 
fitting of, 289-292 
skewed, 287-289 
sjunmetrical, 268-271 

Binomial weights, used in moving average 
tiend, 421-426 
Birth rates, 154-155 
Black, A. G., 691ii 

Board of Governors of the Federal Reserve 
System Index of Industrial Produc* 
tion, 631 

Brumbaugh, M. A., 553n, 640 
Buffalo, Index of Busmess Activity in, 640- 
641 

Burgess, R. W., 231 

Burns, Arthur P., 562n, 571 

Business cycles (see Cyclical movements) 

C 

Calculating machine, use of, 866^870 

Calculation, aids to, 865-871 

Calendar, flexible, of working days, S86-8S7 
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Calendar variation, 379-382 
Campbell, N R , 268n 
Camp- Meid ell 345ii-34:6n 

Carl Schleicher : " . S7n 

Causation confused with association, 10 
Central tendency, measures of {i>cealso Mean, 
median, and mode, characteiistics of; 
Price relatives, averages of) 
arithmetic mean, 194-207, 232 
geometric mean, 221-226, 232 
harmonic mean, 226-231, 232 
median, 207-210 
mode, 212-215 

Chaddock, Robert E , 160n, 239n, 686n 
Chain index- 

circular test applied to, 621-623 
illustration of, 616-621 
purpose of, 596 

Chart construction, rules for (see Bar chart, 
C — d aits; Pictorial de- 
" * ' Semi-loaanthnni'’ 

cnart; bimpie curves; tjuicrl 
map) 

Chart pro ’-.O’ 5 S7-89 
Charts f' r. -‘Ik chart; Component part 
charts, Pictorial devices, Pie dia- 
grams, fh, — n Simple 

curves, ^ i . ‘ ‘ 1 ^ 

bases of comparison, 124 
special puipose, 91 
types of, 71 

Chi {see Skewness, relative: beta measure of) 

i ■ I ■ ' ■ . “ of percentages, 333-337 

J.* I ‘i’ of standard deviation, 340- 
343 

defined, 286 
table of values of, 882 
test of goodness of fit, 286-287 
Circular test, 621-623 
Classification: 
bases of, 3 
chronological, 4-5 
concealed, 12 

-i'i ■. t 3 
nil - a ‘ 6 

C^.' . 1 -- ' Company Index of Amer- 

ican Business Activity since 1790, 
643-644 

method of discovering lag, 820 
Codex Book Company, 295n 
Collection of data: 
general plan, 16-18 
methods • 

enumeration, 16 
registration, 16 
procedure outlined, 16 
sample, selection of, 26-83 
schedule: 

making of, 18- 26 
use of, 33-34 

Common logarithms, definition of, 107n 
Comparisons * 
by tables, 53-56 
graphic bases of, 124-125 
Component part charts: 
bar charts, 133-137 
line diagrams, 195-197 
pie diagrams, 133-137 
Compound-interest curve, 102 
Confidence limits {iee I’lducial limits) 


Contm'xencv 687-688 

Co'‘'.Tiaon' '.see Variable, continuouts) 

Control of quality, 348-351 

Coordinate paper, e:fiaceable ruling, 86-87 

Coordinates, 86 

Correlation 

and causation, 678-679 
and explained variance, 660-665 
and explained variation, 664n, 693-694, 
739-740 

and horizontal deviations, 665-666 
and measurement of lag (see Lag) 
coefficient, 653, 654-655, 678-679 
first moment correlation, 804n 
index of (see aUo Non-hnear cor i elation) ,69£ 
meamng of, 651-657, 664, 664n-665n, 666- 
667, 666n, 800-801 
means, use of (see Correlation ratio) 
multiple (see Multiple correlation) 
non-lmear (see Non-hnear coi relation) 
of tune senes (see Time series correlation) 
partial (see Partial correlation) 
population estimate, 679 
practical methods of computation, 667-673 
product-moment formula, 666-667, 672-- 
673, 795-802 

qualitative distributions, 687-689 
ranked data, 685-686 
rehability of, 680-685 
simple: 

grouped data, 673-678 
ungrouped data, 654-673 
theory of, 654-667 

used in mdex number construction, 644 
Correlation ratio, 727-736 
limitations of, 735-736 
Cosgrove, Jessica, 7n 
Cosines, table of, 888 
Cournot, 574n 

Cowden, D. J., 159ii, 205n, 266n, 275n, 348n5 
639n, 665n, 806n 
Cox, Garfield V., 821n 
Criterion of likelihood (see Likelihood, cri- 
terion of) 

Crow, Carl, 33n 
Crow^der, W. P , 264n, 296n 
Croxton, Frederick E., 18n, 124n, 134n, lS5n 
160n, 205ii, 266n, 275n, 348n, 639 e 
665n, 806n 

Curve (see Simple curves) 

Curve type, criteria of, 284-286 
Cutts, Jesse M., 627n 
Cycle 
chart, 551 

contraction, peiiod of, 566 
expansion, period of, 564 
pattern of, 566 
peak of, 562 
recession, month of, 562 
reference, 562 
revival, month of 562 
specific, 562 
stages of, 564-566 
trough of, 562 
Cyclical movements: 
comparison of, 549-552 
explamed, 367-369 
mdexes of, 639-644 
methods of isolating: 

cyclical averages, 660-571 
direct, 552-^554 
harmonic analysis, 654-660 
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Cyclical movemenls (coni.) : 
residual, 540-549 


3P 

Data, statistical (see also ludex numbers, 
data for) 

analysis of, 3, 5-7, 44 
classification of, 3-5 
collection of, 2-3, 16-34 
comparability of, 9, 47-4B 
msufficient, 10 
interpretation of, 7 
meaning of, 1 
period data, 74 
point data, 74 
presentation of: 

by charts,’ 70-145 (see also Charts) 
by tables, 50-68 (see also Tables, sta- 
tistical) 
by text, 49-50 
sources of, 44-48 
taoulation of, 37-44 
Davenport, Donald H., 629, 650 
Davies, G. R , 264n, 296n 
Dawes, Charles G , 816 
Day, E. E., 639, 343 
Death rates, 153-154 
Deciles. 210-211 
Deflating, 382, 573 

Degrees of freedom, 312, 353-356, 711, 730 
De Moivre, Abraham, 266 
Dennis, Samuel T., 627n 
Densities (see Frequency densities) 
Dependent variable (see Variable) 
Determination, 
coefficient of, 663-665 
mdex of, 699 
ratio of, 728, 734 
separate, coefficient of, 774 
Diagram (see Charts, Scatter diagram) 
Discrete (see Variable, discrete) 

Dispersion: 

absolute (see also Averece de'inat’on Per’- 
centile range; QuiciJw' dcv.ation; 
Range; Standard doviaiion, 23.)-246 
g’-ap^ic illustration, 234 
rcl u 11 ' e , 246—249 
Docigc. H F , 351n 
Doolittle, M H., 716 
Doolittle method: 
multiple correlation, 766 
thud degree curve, 716-720 
Double logarithmic paper 

freciueiicy cmve plotted on, 193 
ogive plotted on (see Pareto curve) 

Dow system, 815-816 

E 

Easter, adjustment for, 509-515 
Edgeworth, 593 
Editing schedules, 35-37 
Edmunds, Harriet, 120n 
Elderton, W. P , 2S6n, 293n 
Elmer, Manuel Conrad, 13n 
Emphasis, obtainmg of in tables, 56*^57 
Enumeration, 16 

Equation type, fitness of, 710-712, 731n 
Estimating equation: 
linear, 652-^53, 654, 655^657 


Estimating equation (cont) : 
multiple, 741, 7*3-748, 756-757 
multiple curviliELear, 778-781 

adjusted for variations m some factors 
783 

non-linear: 

logarithms used, 694-697 
reciprocals used, 700-701 
second degree curve, 706-709 
grouped data, 721-727 
third degree, curve, 713-720 
Estimation, net coefficient of, 741 
Explained sum of squares, 748, 757 
Exponential curve: 
modified, 441-447 
properties of, 101-102, 105-106 
trend fitting, 435-440 
Ezekiel, Mordecai, 664n, 730n, 774n 


P 


F, definition of (see aUo z), 347 
F 2 , table of values of, 885 
Factor reversal test, 612-614 
Falkner, Helen D , 469n 
Federal Reserve Bank of New York: 

Index of Trend of Production and Trade, 
616-621 

Monthly Index of Production and Trade, 
634-639 

Ferger, Wirth F., 615-616 
Fiducial limits: 
mepumff of, 314 
of '•raiifi ‘ '1 deviation, 340-343 
Fiducial probability, 3X4 
Findex, 40 

First moment correlation, 804n 
First order coefficients, 770 
Fisher, Arne, 293n 

Fisher, Irving (see alsQ Ideal index number) 
585n, 594, 596n. 810-813 
Fisher, R A , 2S6n, 291n, 307n, 325n, 344, 
346, 435, 683n. 871, 875, 876, 877, 879, 
882 

Flexibility of price, coefficient of, 705, 705n 
Footnotes, in tables, 62 
Forecasting, 
dangers of, 882 
methods of, 

cross-cut analysis, 820 
cyclical sequence, 816-820 
economic rhythm, 813-816 
specific historical analogy, 816 
objections to use of correlation procedure 
in, 810 

Formulae and Symbols, Glossary of, 917-944 
Fortune, sampling method of, 30 
Fourth degree curve (see Polynomial series) 
Frequenej^ curves (see also Lorenz curve; 
Ogive. Paieto curve) 
fitting of, 2G5-303 
plotting of 77-79 

types of (see also Curve type, criteria oi 
Normal curve; Skewness; Hurtosis); 
J curve, 175-176 
reverse J curve, 176 
skewed curve, 175 
symmetrical curve, 175 
U curve, 176 

Frequency densities, 183->-J84 
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Frequency distribution; 
classes, number of, 171-172 
class interval; 

choice of, 171-172 
plotting when unequal, 177-180 
usually uniform, 171 
class limits 

and method of reporting measurements, 
174 

and points of concentration, 173 
mutually exclusive, 174 
number of, 172-174 
open-end, 179-180 
overlapping, 174 

comparison of frequency distributions; 
different class interv^s, 183-184 
same class intervals, 180-183 
continuous variable, 173 
cumulative, 184-186 
curves* 

Lorenz curve, 188-190 
ogive, 184-188 

on fT’d, 193 

on 1 ) "i‘ . ‘ ‘ r ) ' V paper, 295 

on semi-iogaritnmic paper, 293-294 
Pareto curve, 190-193 
discrete variable, 173 
mid-value, location of, 170, 173 
plotting of, 77-79 

Frequency distribution and range chart, 97 
Funkhauser, H. Gray, 71n 


G 

Gauss, 267 

Gaussian curve (see Normal curve) 

General tables, 52 
Geometric mean: 
definition of, 221 
from grouped data, 222 
from ungrouped data, 221-222 
properties of, 221, 222-223 
uses of; 

averaging ratios, 224-225 
finding rate of change, 225-226 
skewed distnoutions, 225 
Geometric progression (see also Compoimd 
interest curve; Exponential curve); 
logarithms of, plotted, 105-106 
plotted on semi-l^ganthmic chart, 106 
properties of, 101-102 
Glossary of Symbols and Formulae, 917- 
944 

Glover, James W., 871 
Gompertz curve, 
as law of growth, 365, 448 
first differences of, 453 
fitting of, 450-452 
properties of, 447-448 
Goodvan, H M., 268n 
Gram-Charher senes, 299n 
Graphic method, advantages and limitations 
of, 70-71 

Graphic presentation (sec Graphic method; 

Charts) 

Growth curves: 

asymptotic (see Modified exponential; 
Gompertz curve; Logistic, Probabil- 
ity paper, arithmetic) 
declining absolute growth, 440-441 
Growth, laws of, 365, 448, 456-458 


H 

Haney, Lewis H , 819 
Harbeson, Robert W., 574n 
Hardy, Chailes O , 821n 
Harmonic mean; 

compared with arithmetic mean, 227-230 

computation of, 226-228 

defimtion of, 226 

properties of, 227 

uses of ; 

averagmg prices during crop year, 231 
numerator-term weights, 227-230 
skewet d:stT**hut on<= ^ 230 
Hartwell, .Tob'-ou, aiul iv bboe, 22n 
Herrman, Helen, 52n 

High-low mid-pomt trend (see Trend, fittmg 
of, cyclical a,verages) 

Hog-corn ratio, 155-156 
Hogg, Margaret H., 630 
Holmes, Bert. E , 652n 
Hotelhng, Harold, 253n 
Hundred per cent hne, 85 
Hunt, Stanley B., 555 
Hypothesis (see Null hypothesis) 

I 

Ideal index number; 
criticisms of, 615-616 
factor reversal test applied to, 612-614 
formula, 594r~595 
time reversal test, 613 
Improprieties (see also Percentages, faulty 
use, illustrations of) ; 
bias, 7-8 
carelessness, 9 
causation confused, 10 
concealed classification, 12 
insufficient data, 10, 160-161 
non-comparable data, 9 
non-seqmtur, 9 

omission of important factor, 8 9 
unrepresentative data, 10 
Independent variable (see Variable) 

Index, definition of, 575-576 
Indexes (see also Index numbers) : 
cham, 616-621 

physical volume of production and trade, 
631-644 

business cycles, 639-644 
price; 

changes in cost of living, 629-630 
geographical variations in cost of hving, 
630-631 

wholesale commodity pnees, 627-629 
qualitative changes or differences: 
adequacy of state care of mental pa^ 
tients, 644^645 

adequacy of state school systems, 645- 
650 

sources of, 650 

Index numbers (see also Indexes) 
aggregative (see Aggregative price index 
numbers; Aggregative quantity index 
numbers) 

averages of relatives (see Price relatives, 
Quantity relatives) 
changing weights, 625-626 
comparison of results, 605-607 
concepts of, 612-616 
data for, 582-586 
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Index numbers {cont .) . 
formula and use, 614-616 
problems m constructing, 576-577 
selection of base, 586 
substituting commodities, 623-625 
tests of, 612-614 
uses of, 573-576 

Individual importance, coefficients of 
beta, 773n-774n 
part con elation, 774n 
partial con elation, 742-743, 761-765, 769- 
772 

separate determination, 774n 
International Business Machines Corpora- 
tion, 41n 

Irregular variations 
explained, 372-373 
frequency curve of, 375-376 
smoothing of, 54^549 

J 

J curve, 175-176 
Johnson, Norris O , 635n, 639 
Joy, Aryness, 606n 

K 

Kappa {see Curve type, ciitena of) 

Karsten, Karl G., 818n 
Kendall, M. G , 271n, 307n, 882 
Keuffel and Esser Company, 90n 
ICeynes, J M , 592n, 594, 615 
Key punch, 40, 42 
King, Willford 1 , 216n, 614 
Kondratieff, 376 
Kurtosis 

absolute, 258-259 
graphic illustration of, 235 
relative, 259-262 

Kuznets, Simon S., 376, 518n, 731, 732n 

L 

L {see also Likelihood, criterion of), table of 
values of, 881 
Labels, scale, 87 
Lag: 

distribution of, 810-813 
measurement of, 805-810 
difficulties m, 810 
Laspeyres, 592 
Leptokurtic, 235, 258 
Lettering of charts, 89 
Likelihood, ciitenon of, 359-362 
Liuk relatives, 486-492, 617-619 
Literary Digest, sampling method of, 30, 32 
Logarithm {see Common logarithm) 

Logan thmic chart {see Semi-loganthmic 
chart ; Double logarithmic paper) 
Logarithmic noimal curve, fittxng of, 293-299 
Logarithms common, table of, 902-916 
Logistic curve 

as law of population growth, 456-458 
first differences of, 453 
fitting of 

by method of selected pomts, 453-456 
by use of leciprocals, 452-453 
properties of, 452 
series of, 457-458 
skewed, 458 
Long cycles, 376 
Lorenz curve, 188-190 


M 

Macaulay, Frederick R., 500n— 501n, 549n 
Mahalanobis, P. C , 879, 881 
Map {see Statistical map) 

Marshall, Alfred, 574n, 593 
Marshall-Edgeworth formula, 594 
Mathematical appendix, 829-864 
Maximum variation charts, 92 
Mean {see Arithmetic mean; Geometric 
mean; Harmonic mean) 

Mean deviation {see Average deviation) 
Mean, median, and mode, characteristics of: 
algebraic treatment, 215-216 
extreme values, effect of, 218-219 
familiarity of, 215 
graphic location of, 215, 216 
irregularity of data, effect of, 219-220 
mathematical properties of, 220 
need for classifying data, 216-217 
open-end classes, effect of, 217-218 
reliability of, 220 

selection of appropriate measure, 220-221 
skewness, effect of, 217-218 
unequal class mtervals, effect of, 217' 
Means, Gardiner C., 574n, 673n 
Mean square contingency, coefficient of, 688 
Median . 

definition of, 207 
graphic location: 

frequency curve, 215, 216 
ogive, 210 

grouped data, 208-210 
same as second quartile, 211 
ungrouped data, 207-208 
Mental Patients, Index of Adequacy of State 
Care of, 644-645 
Mesokurtic, 235, 258 
Methods: 
research, 13-14 
statistical, 1—7 

Mills, Frederick C., 435n, 574n 
Mmef, J, R , 770 

Mmor means {see Geometric mean; Har- 
monic mean; Quadratic mean) 
Misuses {see Improprieties) 

Mitchell, Wesley C., 367, 562, 564n, 571, 
643n, 820 

Mode. 

betas used m computation ox, 212, 257 
defimtion of, 212 
graphic location: 

column diagram, 213 
frequency curve, 215, 216 
grouped data: 

difference method, 213-214 
frequency method, 214n 
ungrouped data, 212 
Modified exponential curve’ 
derivation of formulae for constants, 443- 
445 

fitting of, 445-447 
properties of, 441-443 
Modified mean: 
forms of, 204-205 
moving, 535 

use of in computing seasonal index, 479- 
484 

Modley, Rudolph, 132n, 134n 
Moments 

correction of for grouping error, 262-264 
-^hen applicable. 263-264, 301 n 
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Moments -iont .) : 
first moment, 254 
fourth moment, 259-262 
second moment (see also Variance), 254 
third moment, 254-257 
Moody’s, 816 
Moore, Henry L., 574ii 
Morse, John W , 663n 
Mort, Paul R , 648-649 
Moving averages: 
cycles, describing of, 500n 
irregular movements, smoothmg of, 548- 
549, 549n 

moving modified mean, 535 
seasonal index, used in computmg, 471- 
478, 500n 
trend, used as: 

bmomially weighted, 421-426 
simple, 386-395 
Moving aveiage trends: 
simple, 386-395 
weighted, 421-426 

Moving seasonal (see Seasonal indexes, mov- 
ing seasonal) 

Multiple axis charts, 95 
Multiple correlation, 
and explained variation, 742 
coefficient derived from simple and partial 
coefficient, 772 

coefficient derived from simple coefficients, 
763n 

curvilmear 

graphical, 784-789 
mathematical, 778-784 
deviation product sum, check on, 747-748 
effect of additional variables on, 767 
effect of intercoz relations on, 763, 763n 
population estmiate of, 775 
product sums, check on, 743-747 
regarded as simple correlation, 774ii 
reliability of, 775-778 
three mdcpenderit variables, 765-769 
time as an independent variable, 794-795 
two independent variables, 756-761 
Multiple cuiMlineai correlation* 
graphic, 784-789 
limitations of, 788-789 
mathematical. 

check on computation of product sums, 
781 

coefficient of, 783 
estimating equations, 778-781 

N 

National Bureau of Economic Research, 562, 
570, 571 

Nayer, P. P, N., 881 
NEA.' 

Index of Financial Adequacy, 64S-650 
ranking, 647-648 
Net balance charts, 91 
Net correlation (see Partial correlation, Indi- 
vidual importance, coefficients of) 
New York Times Weekly Index of Business 
Activity, 641-643 
Neyman, J , 360n, 881 
Non-determmatioii, coefficienu of, 663n 
Non-lmear correlation: 
logarithms used, 694-699 
population estimate, 712, 730-732 
reciprocals used, 699-705 


Non-linear correlation (cSiL): 
second degree curve used, 
grouped data, 721-725 
ungrouped data, 705-710 
thud degree cuive used, 712-721 
Normal, meanings of, 367n, 545-546 
Normal curve (see also Logarit hm ic normal 
curve) 

and binomial theorem, 271 
development fiom laws of chance, 267-271 
fitting of 

areas, 275-280 
ordinates, 271-275 
formula for, 271 

historical development of, 266-267 
table of areas, 873 
table of ordinates, 872 
testing suitability of, 283-287 
Normal curve of error (see Noimal curve) 
Normal equations* 

fourth degree curve, 432 
multiple correlation, 747-748, 757 
multiple cuiviimear correlation, 781 
second degree curve, 429 
straight line, 401-404 
third degree curve, 430 
Normal probability curve (see Normal curve) 
Null hypothesis, 310-311 

O 

Observation equations, 401 

Ogive, 184-188 (see also Pareto curve) 

Origin, m chart, 72 

Orthogonal polynomials, 433-435 

P 

Paasche, 593 
Palmer, A DeF., 268n 
Part correlation, 774n 

Partial correlation (see also Individual ims 
portance, coefficients of) 
and explained variation, 743 
and net coefficient of estimation, 772 
coefficient derived from lower order coeffi- 
cients, 770-772 
meaning of, 742-743 
population estimate of, 775 
regarded as simple correlation, 774n 
rehability of, 776 

three independent variables, 769-770 
two independent vaiiables, 761-705 
Partial deleir.unrtion, coefficient of, 762 
Paton, W A , 180n 
Pearl, Raymond, 455n, 456 
Pearl-Reed curve (see Logistic curve) 
Pearson, E, S , 360n, 881 
Pearson, Karl, 251n, 266n, 2S6n, 651n, 872. 
885 

Percentage frequency distributions, 180-lSS 
Percentages (see also Ratios) : 
averaging of, lGl-162, 205-206, 232 
batting averages, 156-157 
entry in stub or caption of table, 63 
faulty use of: 

averaging improperly, 161-162 
base, confusion concermng, 159-160 
decimal points misplaced, 161 
large peicentages, 162 
mistakes, arithmetic, IGl 
small numbers 160-lGl 
hundred per cent statement, 157-158 
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Percentages {cont J: 
index numbers, 151 

rounding to total 100 per cent, 63, 150 
sex ratio, 151-152 
Percentile range, 237 
Percentiles, 210-211 
Period data, 74 
Periodic curve, 555, 559-560 
Periodic movements (see also Seasonal move- 
ments , Seasonal mdexes) * 
explained, 369-372 
methods of measuring- 
averages adjusted for trend, 469-471 
averages of unadjusted data, 464-466 
comparison of methods, 491-492 
graphic, 484-486 
hnk relative, 486-492 
use of logarithms, 490n 
percentages of 466-467 

percentages of . ^ '■*' 71 

percentages of 12-month moving aver- 
age, 471-484 

types of, 369-372, 500-525 
Penodogram, 559 
Peiiodogram analysis, 555-559 
Persons-Day-Thomas Index of Manufactur- 
ing Pioduction, 639, 643 
Persons, Wanen M , 639, 640n, 643 
Phillips, Frank M , 646-647 
Phillips’ Index of Educational Rank, 646-647 
Pictograph (sec Pictorial devices) 

Pictorial devices, 131-133 
Pie diagrams, 133-137 
Piser, Leroy M , 509n, 529n 
Platykurtic, 235, 258 
Playfair, William, 71 
Point data, 74 

Polynomial series (see also Straight line 
trend) 

fitted to logf'"'Hhm?. ^35-440 
orl'iogon.il t;('.vnon. a''- 433-435 
noI'TO’u cl Ticnds 
p oj;Ci rie-, c 42() 12S, 430, 432 
second depoo, 420-430 
third degiee, 430-432 
used in non-J'iioiir con elation: 
multioic. 77^5 -784 
7('“>-7J7 

Popuhrion (dn^L-'c?. adjustment for, 382, 
411-412 

Population density, 152 
Population estimates (see also Logistic 
curve) : 460-461 

Powers of natural numbers, sums of (see 
Sums of powers) 

Precision, measure of, 245 
Prefatory note 02 
Prescott, Raymond B., 365, 44Sn 
Presentation of data (see Bar charts,* Com- 
ponent part chart, Data, statistical, 
Pictorial devices; Pie diagrams; Semi- 
logarithmic chart, Simple curves’ Sta* 
tistical map Tables, statistical) 

Price changes, adjustment for, 3S2-383 
Price relatives: 
averages of' 
gropp weights, 603-604 
procedure, 597-599 
types of average, 599-601 
weighting systems, 601-603 
behavior of, 577-582 
definition of, 576 


Primary source, 44 

Piimary trend (see Trend, primary) 

Probability paper* 

arithmetic, used m trend fitting, 458-460 
logarithmic, used with frequency distribu- 
tion, 295 

I Proportions, chart, 87-89 
' Protractor, percentage, 135, 137 
j, Punch card, 40-43 


Quadrants, 73-74 
Quadratic mean, 232 

Qualitative distributions, correlation of, 686- 
689 

Quality, control of, 348-351 

Quantity relatives, averages of, 609-611 

Quartile deviation, 237-238 

Quartiles, 210-211 

Questionnaire, 16 

Quintiles, 210-211 

R 

Ralph C Coxhead Corporation, 90ii 
Range, 236-237 
Range charts, 92-93 
Ranked data, correlation of, 685-686 
Ratio chart (see Semi-logarithmic chart), 107 
Ratios (see also Percentages, Price relatives); 
averaging 

arithmetically, 161-162 
arithmetic v geometric mean, 224-225 
geometrically, 225 
calculation of, 146-148 
effect of changing base, 148-149 
faulty use of percentages, 159-162 
recording percentages, 63, 149-150 
uses of, 151-159 
airplane accident ratios, 157 
batting averages, 156 
birth rates, 154-155 
crop yields per acre, 155 
death rates, 153-154 
hog-corn latio, 155-156 
hundred per cent statement, 157—158 
index numbers, 151 
per capita ratios, 162-153 
persons per family, 152 
population density, 152 
railroad ratios, 158-159 
sex ratio, 151-152 
Reciprocals, table of, 892-901 
Reed, L. J , 456 

Reference cycle analysis, 566-568 
Reference tables {see General tables) 
Registration, 16 

Reliability {see also Analysis of variance; Cri- 
terion of likelihood; Signifilcanc©! 
Standard error) 
and control of quahty, 348 
of a percentage, 332-337 
of mean 

known population, 305-310 
small sample, 325-329 
stratified sample, 324-'325 
unknown population, 311—314 
of multiple correlation coefficient, 775-778 
of non**linear correlation coefficients, 736- 
738 

of seasonal index, 497-498 
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Reliability (cont.) * 

of simple correlation coefficient, 680-685 
of standard deviation: 
large sample, 339 
small sample, 340-343 
Remington Rand Bnsmess Service, 41n 
Reproduction, 67-68 
Research methods* 
case, 13 
deductive, 14 
experimental, 13 
historical, 13 
inductive, 14 
Reverse J curve, 176 
Rhea, Robert, 815 
Rietz, H L., 263n, 293u 
Rounding, 63, 149-150 
Rugg, H. 0., 872, 873 
Ruhng of curves 
of curves, 85-86 
of tables, 65 

S 

Sample: 
bias in, 32 
purposive, 31 
random, 27-28 
representative, 27 

stratified (see also Stratified sample), 28-31 
stratified purposive, 31 
Scale labels, 87 

Scatter, zones of (see also Standard error of 
estimate) 

linear correlation, 658-660 
non-linear correlation, 697, 702-703 
Scatter diagiam, 651-652 
Scatter ratio, ^ 698-699 
Schedule* 
editing, 35-37 
illustrations, 19-22 
making, 18-26 
meaning of term, 16 
use of, 33-34 
Schultz, Henry, 574n 
Score sheet (see Tally sheet) 

Scott, Frances V., 629, 650 
Seasonal indexes (see also Periodic move- 
ments, methods of measurmg) ; 
amplitude, varying, 518-524 
combination types, 525 
continuity of, 524 
Easter adjustment, 509-515 
logical basis, 527-528 
stable, 467-492 
sudden changes in, 516 
tests of, 497-498 

tuning, short time shifts in, 516-518 
weekly, 528-538 

Seasonal movements (see also Periodic move- 
ments) : 

adjustment for: 
by division, 492-497 
by subtraction, 625-527 
nature of, 370-372 
types of: 

amplitude, varying, 518-524 
combmation of, 525 
movmg, 500-509 
pattern, sudden changes in, 516 
stable, 467-492 

timmg, short time shifts in, 516-518 


Seasonal variation (see Sesfjonal movements) 

Secondary source, 44 

Secondary trend (see Trend, secondary) 

Second degree curve (see Polynonual series) 

Second order coefficients, 771 

Secular trend (see Trend) 

Selected points: 
logistic trend, 453-456 
straight line trend, 397-399 
Semi-averages (see Straight Ime trend, s©;. 
lected points fit) 

Semi-interquartde range (see Quartile devia- 
tion) 

Semi-logarithmic chart* 
adapting scale of, 107-109 
applications of, 109, 112-119 
fluctuations, comparison of, 114-117 
mcrease or decrease, comparmg rates of, 
109, 112-114 

interpolation and extrapolation, 118-119 
showing ratios, 117-118 
cycles, 107 

expansion and contraction of scale, 120- 
123 

frequency curve plotted on, 293-294 
interpretation of, 109-111 
phases, 107 

prmciples of construction, 106-107, 123 
Semi-tabular presentation, 51 
Sex ratio, 151-152 

Sheppard’s corrections (see Moments, correc- 
tion of for grouping error) 

Sheppard’s method of unlike signs, 688-689 
Shewhart, W. A., 264n, 299n, 346n, 351n, 885 
Significance (see also Analysis of vaiiance, 
Criterion of likelihood; Standard 
error) ; 

levels of, 317 
of deviation of mean: 

from hypothetical population mean, 
314-317 

from known population mean, 308-310 
of difference between means: 
large samples, 317-324 
small samples, 329-331 
small samples, A^i 9 ^ N 2 , 330-331 
of difference between percentages, 337- 
339 

of difference between standard deviations: 
iV’s are large and iVi = iV 2 , 343-344 
iV’s are small and/or Ni 9 ^ iV 2 , 344-348 
Silhouette charts, 91-92 
Simple curves 

axes for curve plotting, 71-72 
base line, 81-85 
chart proportions, 87-89 
compared with bar charts, 128-129 
coordmates, 86 

frequency distribution curves, 77-79 

hundred per cent line, 85 

lettering, 89 

origin, 72 

quadrants, 73-74 

ruling of curves, 85-86 

scale labels, 87 

source, 91 

special purpose charts, 91 
time series curves, 74r-77 
title, 91 
variables, 72 
zero line, 81 

Sine-cosine curve, 559-560 
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Sines, table of, 88$ 

Skewed curve, 175 

fitting of by use of logaritbms, 293-299 
fitting of normal curve with adjustment 
for skewness, 299-303 
Skewness, 
absolute 

Pearsonian measure of, 251, 253 
percentile measure of, 254 
quartile measure of, 253 
third moment measure of, 254-257 
meaning of, 234-235, 249-251 
relative. 

alpha measure of, 256 
beta measures of, 256-257 
Slide rule, use of, 870-871 
Smith, Bradford B., 817-818 
Snyder. Carl, 631, 638 
Snyder’s Index of the General Price Level, 
631 

Solomons, Leonard M., 253n 
Sorter, electric, 41, 43 
Source note: 
of chart, 91 
of table, 62-63 
Sources of data: 
comparability of, 47-48 
primary, 44 
rehability of, 45-46 
secondary, 44 
selected list of, 825-828 
Spahr, Walter Earl, 13n, 691n 
Specific cycle analysis, 562-566 
Spurr, William A,, 484n 
Square roots, table of, 901 
Squares, table of, 901 
Stamp, Sir Josiah, 15n, 161n 
Standard deviation, 
and areas under normal curve, 244 
grouped data, 242-243 
population estimate of, 311-313 
properties of, 243-245 
ung^onpcd data, 240-242 
U':ei in comparmg cychcal movements, 
549-552 

used m index number construction, 643 
Standard error, 
of a percentage, 332 

of coefficient of partial correlation, 776, 
776n 

* of coefficient of simple correlation, 680- 
681, 775 

of coefficient of variation, 344n 
of correlation ratio, 736 
of difference between coefficients of varia- 
tion, 344n 

of difference between means: 

Ni = iV2, 318 
Ni W2, 322-323 
paired items, 318n 
of difference between percentages: 

Ni = iV2, 337-338 
iVi 7^ W2, 338 

of difference between standard deviations, 
344 

of index of correlation, 736 
of mean: 

finite sample, 307n 
known population, 307-308 
unknown population, 311-313 
of standard deviation, 339 
of Z. 683, 778 


Standard error of estimate* 
effect of additional variables on, 767 
multiple correlation, 742, 758, 772-773 
derived from simple and partial coeffi- 
cients, 772-773 

multiple curvilinear correlation, 783 
non-lmear correlation, 697-699, 702-703, 
704, 709-710, 721, 727 
simple correlation, 654, 657-660 
Standard Statistics Co , 821 
Statistical data (see Data, statistical) 
Statistical maps: 
dot maps, 137-142 
hatched maps, 137 
pm maps, 142-145 
Statistical method, 1-7 
Statistical reports, 66-68 
Statistical tables (see Tables, statistical) 
Statistics: 
definition of, 1 
origin of, 2 

Stecher, Margaret Loomis, 630 
Stem, Harold, 124n 
Stencils for lettering, 90 
Straight line trend: 
equation explained, 395-397 
least squares fit: 

adapting equation to monthly data, 408- 
411 

even number of items, 404-405 
fitted to logarithms, 435-440 
logical basis, 399-400, 400n 
normal equations, 401-404 
observation equations, 401 
odd number of items, 404-405 
selected points fit, 397-399 
Stratified sample: 

meaning of, 28-31 
Stryker, Roy E., 134u 
Student, 875 
Summary tables, 52 ’ 

Sum of squares (see Explained sum of 
squares) 

Sums of powers ol natural numbers, table of, 
889 

Sums of powers of odd natural numbers, 
table of, 890-891 
Swenson, Rmehart John, 13n 
Symbols and formulae, glossary of, 917—944 
Symmetrical curve, 175 

T 

t 

and reliability of correlation coefficients, 
681-682, 778 

and reliability of mean, 327-330 
and significance of difference between 
means, 330-331 
definition of, 327 
distribution, 325-327 
table of values of, 875 
Tables for calculation, list of, 871 
Tables, statistical: 
arrangement of entries, 58-61 
comparisons, making of, 53-56 
emphasis, obtainmg of, 56-57 
footnotes, 62 
guiding the eye, 66 
percentages, ube of, 63 
prefatory note, 62 
reproduction of, 67-68 
rounding numbers, 63-64, 149-150 



Xll 


INDEX 


Tables, statistical (coni.): 
ruling, 65 
size and shape, 64 
source notes, 62-63 
title and identification, 62 
totals, 64 

type size and style, 66 
types of, 52-53 
typewritten, 67 
units, 64 

Tabular presentation (see Tables, statistical) 
Tabulation 

hand sorting, 40 
mechanical, 40-44 
score or tally sheet, 37-40 
Tabulator, electric, 41, 43-44 
Tally sheet, 37-40 
Tchebychefi’s inequality, 345n „ 

Text tables (see Summary tables) 

Third degree curve (see Polynomial series) 
Thomas, Woodlief, 50Gn, 632n, 639, 643 
Thorp, Willard L., 378, 566 
Time element m correlation: 
f dn.=rmon+ of senes for, 792-794 
V'O 0 . r-ub le correlation, 794-795 
Time reversal test, 613 
Time senes* 

characteristics of, 363-379 
calendar variation, 379-382 
correlation of (see Time series correla- 
tion) 

cyclical movements, 367-369 
irregular variations, 372-373, 375-376 
long cycles, 376 
Iiei iodic movements, 369-372 
primary trend, 378 
secondary trend, 376-378 
trend, secular, 364-367 
graphic analysis, 372-373 
graphic synthesis, 373-374 
method of analysis, 375 
plotting of, 74-77 
prelimmary treatment of, 378-383 
«! 0 cunn£r comparabihty of, 383-384 
Time " 0 . ?'• correlation (see also Lag): 
adjusted cyclical relatives, 795-802 
^ — r^'i'^Vn, 794-795 
. -I I 791 

percentages of normal, 792-795 
percentages of preceding year, 791-792 
702-793 

^ ^ I ' - i‘ >s.)> 

Tippett, L. H. C , 28n, 285n, 286ii, 291n, 
312n, 331n, 846n 

Title: 

of chart, 91 
of table, 62 

Totals, where shown in table, 64 
Trend: 

adjustment for, 419-420 
empirical tests of data, 432, 461—462 
explained, 364-367 
fitting of 

by inspection, 386 
cyclical averages, 412-418 
moving averages (see Moving average 
trends) 
polynomials: 

fitted to logarithms, 435-440 
orthogonal, 433-485 
simple, 426-432 


Trend (coni ) : 
fitting of (coni ) : 
related series used, 411-412 
senes of curves, 411, 457-458 
straight line (sec Straight Ime trend) 
inter-cycle, 562 
intra-cycle, 562 
nature of, 364-367 
primary, 378 
secondary, 376-378 
selection of type, 418-419 
Type size and style in table, 66 
Typewriter, use in table construction: 
in chart lettermg, 90n 
in table construction, 67,' 


U 

U curve, 176 

United Business Service, 822 
United States Bureau of Labor Statistics 
Index of changes in Cost of Living 629 
Index of Wholesale Commodity Prices, 
627-629 

Umts, how shown in table, 64 

Unlike signs, Sheppard’s method of, 688-689 


V 

Variable: 

continuous and discrete, 173 
independent and dependent, 72, 652, 665- 
666, 740 
Variance: 

additive quality of, 663 
analysis of: 

column means, 351-359 
equation type, fitness of, 710-712, 734- 
735, 736-738 

multiple correlation, 776-777 
non-linear correlation coefficients, 736- 
738 

partial correlation, 777 
seasonal index, 497n 
simple correlation coefficients, 682-683 
and index of correlation, 693-694 
and simple correlation coefficient, 661—663 
between columns, 354-355 
definition of, 240 * 

explained, 661-663 

multiple curvilinear correlation (see Mul- 
tiple curmlinear correlation) 
unexplained, 661-663 
within columns, 353-354 
Variation: 

additive quality of, 353 
and correlation coefficient, 664n 
and index of correlation, 693-694 
hciwcen colu un-, 353 
fiopfficicnt of (shL Dispersion, relative) 
definition of, 240 
explained, 711, 758 
total, 351-353 
unexplained, 711 
within columns, 353-354 
Vari-typer, QOtl 

Varying horiaontal scale charts, 95 
Verhulst, 456 



INDEX 


Kill 


W 

Walker, Helen M , 71n, 266n 
Weekly seasonal (see Seasonal indexes, 
weekly) 

Weld, L. D., 311 
Whelpton, Pascal K., 460 
Whipple, George Chandler, 154n, 155n 
Winston, Ellen, 644 
Wood-Kegan Instrument Co., 90n 
Working days, flexible calendar of, 886-887 
Working, Holbrook, 231 
W.P. A Index of Intercity Differences in Cost 
of Living, 630-631 


Y 

Yates, E., 871, 877, 879 

Yule, G. Udny, 271n, 307n, 882 

Z 

z (see also Variance, analysis of) * 
defimtion of, 344-345 
table of values of, 876-879 
use of in testing significance of difference 
between standard deviations or vari- 
ances, 344-348, 682-683 

Z charts, 93 

Z transformation, 683-685, 778 

Zero line, 81 

Zero order coefficients, 770 



